Helsinki-NLP / LanguageCodes

MIT License
4 stars 0 forks source link

Convert to JSON? #2

Open RokeJulianLockhart opened 1 month ago

RokeJulianLockhart commented 1 month ago

Currently, it's a .TSV, with space-delimited arrays. This is quite non-standard formatting. If it instead explicitly mapped each language family to a standard array - ["eng", "fra"], etc. - it'd be a damn lot more convenient to parse.

jorgtied commented 1 month ago

If you refer to the files in data then those files are compiled from various sources to feed into the modules. Some of them come as tsv files and others are converted into tsv as some kind of intermediate format to be processed when creating the module data structures. Feel free to reformat into json if you like. Personally, I find tsv often quite convenient to be processed by line-based unix command-line tools and pipes but it's a matter of taste.

RokeJulianLockhart commented 1 month ago

Personally, I find tsv often quite convenient to be processed by line-based unix command-line tools and pipes but it's a matter of taste.

@jorgtied, if you're used to having to process text with RegEx like most UNIX binaries' outputs necessitate, that makes more sense. I find that a little fragile, and prefer to use object-oriented approaches. Thanks for the response.