Lots of changes here (may be considered a refactor more than a PR, but will still require some heavy code reviews and discussion about which changes to keep/fold in).
Summary of changes:
Added commands for bff and sysreq to get sense of how much memory a given BFF run will require
Changed some defaults of arguments:
min-ngram/max-ngram now default to [20,20]
by default the bloom filter file is not saved (this can be specified)
annotations have been merged into a single argument
progress bar present (but a no-progress-bar arg is also present)
some more abstraction/functions to break things up and eventually not repeat code when I push the S3 PR
added BOTH level removal type (some discussion about what this does in the RemoveType enum)
Added some printouts with BFF sparsity, removal rates, time
misc performance-y things, like parallel iteration in some places
Lots of changes here (may be considered a refactor more than a PR, but will still require some heavy code reviews and discussion about which changes to keep/fold in).
Summary of changes:
bff
andsysreq
to get sense of how much memory a given BFF run will requireno-progress-bar
arg is also present)