direct-phonology / jdsw

Parsing the "Jingdian Shiwen" with spaCy
MIT License
2 stars 0 forks source link

implement pipeline pattern for data transformations #22

Closed thatbudakguy closed 1 year ago

thatbudakguy commented 1 year ago

now that we have blank CoNLL-U ready to be annotated, we can structure the code around performing transformations in a pipeline format, similar to spaCy:

  1. user defines what transformations should take place, in a list/collection
  2. at runtime, a script initializes each of the transformations ("pipes") and adds them to a pipeline
  3. each CoNLL-U file is loaded and parsed using pyconll
  4. the data is passed through the pipeline, and the output of each pipe is the input to the next pipe
  5. the final output is re-serialized to CoNLL-U and the input file is overwritten

the basic structure of a pipe probably involves initialization (passing in a config) and then a method (__call__, maybe) that the pipeline will call by passing in the output from the previous pipe, and which should return the same type of output as its input.

thatbudakguy commented 1 year ago