Bookworm-project / BookwormDB

Tools for text tokenization and encoding
MIT License
84 stars 12 forks source link

No error on too many grams / unexpected behavior #107

Open organisciak opened 7 years ago

organisciak commented 7 years ago

As mentioned in https://github.com/Bookworm-project/BookwormDB/issues/105, querying for an ngram greater than the n that are indexed returns full summary statistics. This should give an error or, at the very least, return zeroes.

The error handling can be done with the new "method":"data", "format":"json" response type.

organisciak commented 7 years ago

Is there a already a way to recall the number of words that are indexed, or does it need a new method to check with the DB? I have a simple response here that raises an error if there are more than 2 words, but for unigram-only BW or event >2grams, the better version is to compare against the index parameters.

bmschmidt commented 7 years ago

Currently there's no way to check. The ticket is open at https://github.com/Bookworm-project/BookwormDB/issues/88.

I think this requires a whole new table in the MySQL database to use as a key-value store.

For short-term, the big question is what happens when you execute a bigram query on a unigram-only bookworm. I'm not sure, but there's probably some way to catch an error?