Bookworm-project / BookwormDB

Tools for text tokenization and encoding
MIT License
84 stars 12 forks source link

Never group or search on variables more than 3 byes #27

Closed bmschmidt closed 11 years ago

bmschmidt commented 11 years ago

Allowing user-generated strings to appear in the fastcat table has two problems:

  1. They can take up enormous amounts of memory (this greatly confused the NYTworm, and will eventually make OneClick hosted infrastructure unsustainable.)
  2. GROUP BY queries take much longer. (For example, to return the results for each result in ChronAm takes 8 seconds with a character element of about 12, as opposed to about 3 seconds with a MEDIUMINT variable.

This will require significantly rejiggering the code, though, particularly on the API end to have some subqueries at the end to apply the longer names back transparently.

bmschmidt commented 11 years ago

All right, this was complete a month or two ago.