Closed thatbudakguy closed 1 year ago
DataLoader
or similar interface that loads and combines Zhengwen texts with their JDSW equivalents (see pytorch Dataset
, maybe implement a MergeDataLoader
for n > 1 Dataset
? or look at the fast.ai DataBlock
API)
now that we have blank CoNLL-U ready to be annotated, we can structure the code around performing transformations in a pipeline format, similar to spaCy:
pyconll
the basic structure of a pipe probably involves initialization (passing in a config) and then a method (
__call__
, maybe) that the pipeline will call by passing in the output from the previous pipe, and which should return the same type of output as its input.