Cleaning database doesn't strip out fixed lengths for categorical variables.

Say you test-build your bookworm out with 100 files, and there are 50 library of congress subjects in those hundred files.

CreateDatabase.py will assign a TINYINT value to LCSH__id when it builds a fast lookup table.

But then you build it with the whole thing: 1000 files, say. And now there are 300 library of congress subjects.

The CreateDatabase.py doesn't drop all the original tables when it loads in your new data. But now you need 300 LCSH__id identifiers; but you've been locked into a format that only allocates 128 spots for them.

Solution: Drop the tables altogether on every build? Dynamically rebuild them based on the new information? It's a tricky call, I think.

Bookworm-project / BookwormDB

Cleaning database doesn't strip out fixed lengths for categorical variables. #37