alerque / stack-verse-mapper

Index Bible verse references in Stack Exchange data dumps.
https://alerque.github.io/stack-verse-mapper
GNU Lesser General Public License v3.0
7 stars 0 forks source link

Include OSIS in index #14

Closed alerque closed 8 years ago

alerque commented 8 years ago

Hey @curiousdannii I think I see where you are going with using numerical references to verses and how that makes some of the search sorting math easier, but I'm a little concerned for my own prospects. Besides this project I'm hoping for the the generated index to be useful to drop in other projects to create cross-references. For this purpose having a standard reference format attached to each index item is pretty useful.

Can we put the OSIS ref back in the generated index in addition to the start/end offsets you have now? Obviously the major concern here would be data bloat and keeping the index mean and lean. I would propose a two stage system:

  1. Generate the index and stuff it full of whatever could be useful.
  2. Post-process the index for each output target to trim it down to just the data that target needs.

This might mean that the userscript version where this data is embedded would get a slimmed down version of the data with no OSIS refs since they aren't needed. We can also use a minifier to compress the keys from the human readable format to something that takes less space over the wire. In the mean time the "download the JSON dump for your own project" version of the index would have all the goodies with full readable keys, etc.

Thoughts?

curiousdannii commented 8 years ago

Sure thing. Go ahead and uncomment this line.

I had been wondering about shortening a lot of the keys in the index (title to t, parent to p etc) but didn't because it would be a lot less user friendly. Probably it's unnecessary because gzip will compress that all very nicely, but there's a chance perhaps that it may be too big for some edge case browser's JSON function to handle. If we ever decide it would help to shorten the keys, then doing it in two stages like you suggest would be the way to go.

curiousdannii commented 8 years ago

Btw, on the topic of indexes for other uses, do you see any problems with changing to the structure I suggested here? https://github.com/alerque/stack-verse-mapper/issues/9#issuecomment-171131529

alerque commented 8 years ago

I agree keeping the user friendly scheme is important, at least in the devel environment and for making our index available to other hackers. I don't think the build_index.js script even needs to consider building a more compressed version. Anything we do along those lines can be done better by a post-processor anyway.

I haven't had time to digest the other issue yet, but I'll get there.