Bookworm-project / BookwormDB

Tools for text tokenization and encoding
MIT License
84 stars 12 forks source link

MetaWorm purpose #110

Closed organisciak closed 3 years ago

organisciak commented 7 years ago

I can't figure out what MetaWorm.py does. It looks incomplete: is it outdated legacy code that can be deleted, or something in progress? Doesn't seem used anywhere.

bmschmidt commented 7 years ago

Incomplete in progress (though obviously not edited for a couple years).

It's supposed to be a second implementation of the API in addition to the SQL implementation that implements the API on top of itself, so that you can distribute an API query across multiple servers.

This has two conceivable purposes:

  1. For very large bookworms (> 10m individual texts), makes it possible to distribute the files across a number of different servers. (The summary statistics work better than some of the other methods; I don't know how to best return search results, for example.)
  2. When you want to compare two bookworms to each other, you could make that possible through an intermediary site. Compare the lines on the Hathi bookworm to those from ChronAm over the same period, for example. This would be useful, but is hampered by the need for identical key names in each.

Ideally, I guess, it should be cordoned off into its own branch until it works.

bmschmidt commented 3 years ago

Closing this out, because the heir to this method is now in use in one of my production installations with a different name.