direct-phonology / jdsw

Parsing the "Jingdian Shiwen" with spaCy
MIT License
2 stars 0 forks source link

convert bulk processing scripts to use fileinput #19

Closed thatbudakguy closed 1 year ago

thatbudakguy commented 1 year ago

see https://docs.python.org/3/library/fileinput.html, in particular methods like fileinput.filename(), fileinput.lineno(), etc.

this should allow us to remove the *_all.py versions of scripts; all scripts now accept any number of files or read directly from stdin. they also output a single file; no need to maintain the segmentation of the input. hopefully this will help obscure differences in segmentation between the JDSW and SBCK editions of texts.

xml2conllu accepts a single file and thus remains unchanged.

thatbudakguy commented 1 year ago

made irrelevant by #22