Bookworm-project / BookwormDB

Tools for text tokenization and encoding
MIT License
84 stars 12 forks source link

bookworm prep updates #132

Closed organisciak closed 7 years ago

organisciak commented 7 years ago

Referring to #130. If review looks good, merge and delete the branch.

In addition to documenting bookworm prep options for argparse, this also adds a --gzip flag to preDatabaseMetadata and allows arbitrary keyword args for the prep goals.

I remembered the issue number wrong and named a commit "fixes #120". Doesn't seem worth a force push to fix that, but it's possible that #120 will have to be re-opened if Github auto-closes it.

bmschmidt commented 7 years ago

This looks good to me.

The gzip flag w/ mkfifo workaround is likely to cause breakage in some rare situations. (For instance, I think it would break with the prep guessAtFieldDescriptions operation, which will exhaust the fifo pipe.) One possibility would be to have every call to bookworm check for jsoncatalog.txt.gz and create the pipe if it doesn't exist.

Or we could do what we do that allows things like input.txt.gz and input.sh instead of input.txt, which is have a wrapper around any attempts to access the file that allow multiple methods. (With that, we could even allow raw compressed MARC21).

My inclination, though, is to just slap a warning on the gzip option.

organisciak commented 7 years ago

A warning makes sense. I'll include it in a future update.