there are a number of relatively small improvements that would go a long ways to making synonymizer (and kg2c) builds run smoother:
[x] get rid of synonymizer test build? (more trouble than it's worth)
[x] for kg2c test build, don't overwrite files; use distinct names (i.e., '_TEST')
[ ] only use SRI NN API (not bulk download)
[ ] add some high level stats to synonymizer.py interface (total num identifiers, clusters, which file is being used..)
[x] fix issue where synonymizer build log isn't saved to disk..
[ ] make it easy to run test suite right from synoymizer build directory
[ ] compare the reports for the current build to those of prior build(s) (on arax-databases.rtx.ai) to flag changes
[x] check the kg2pre version in the kg2pre TSV files used and throw an error if doesn't match requested (in synonymizer build; already done in kg2c build) (actually was already done in synonymizer build too)
[x] possibly make user confirm parameters/config settings at very beginning of kg2c build
[x] possibly tweak things to get rid of the temp config_dbs changes?
[ ] update readme/documentation in light of these changes
[ ] add some basic automated tests run at end of kg2c/synonymizer builds
(most of these ideas came out of a chat with @sundareswarpullela today)
there are a number of relatively small improvements that would go a long ways to making synonymizer (and kg2c) builds run smoother:
(most of these ideas came out of a chat with @sundareswarpullela today)