This PR makes some changes to our class names to simplify the learning process for new users. The motivation behind this change is the following principle: A user who only needs the functionality of character-based spans should not need to understand token-based spans.
The main changes I've implemented are:
We now refer to character-based spans as just "spans". The class that used to be called CharSpan is now called Span; CharSpanArray is now called SpanArray; and so on.
Our dtypes now have names that end in "Dtype", for consistency with how Pandas names its data type objects.
Instead of returning two columns of per-token spans ("token_span" and "char_span"), all the syntax analysis input functions (SpaCy, CoNLL, and Watson NLU) now return just a column "span" of dtype SpanDtype
I've updated all the relevant example notebooks to reflect the new nomenclature. We no longer represent each token with both a "char_span" and a "token_span" column.
Analyze_Model_Outputs.ipynb and Analyze_Text.ipynb still use TokenSpanDtype, but they only use it to store spans that both cover multiple tokens and are constrained to start and end on token boundaries.
This PR makes some changes to our class names to simplify the learning process for new users. The motivation behind this change is the following principle: A user who only needs the functionality of character-based spans should not need to understand token-based spans.
The main changes I've implemented are:
CharSpan
is now calledSpan
;CharSpanArray
is now calledSpanArray
; and so on.SpanDtype
Analyze_Model_Outputs.ipynb
andAnalyze_Text.ipynb
still useTokenSpanDtype
, but they only use it to store spans that both cover multiple tokens and are constrained to start and end on token boundaries.