Closed blockspeiser closed 10 years ago
Some of this work has been accomplished (in particular bullet 2), but I'm leaving this open because someone with more experience in DB design than myself could still be very helpful to take a look and suggesting design improvements before we move on the the storing segment leven review status info.
For bullet 2, look in sefaria/counts.py. A collection called counts is now being created which stores a jagged array that matches the structure of the jagged array of the text itself. In place of strings as terminals, a count document stores a integer for the number of available versions of that segment in Hebrew and English.
We need to know more about the texts we have and the texts we need. This involves a few sides:
Collecting information in (1) maybe be just as difficult as actually getting the text (e.g., counting precisely how many Rashis there are on which dafs of gemara). Handling incomplete information will be a requirement. Being able to provide estimates for sizes will be very helpful for estimating the magnitude of our task.