Data generation command-line options, added additional search options on main page (filter by testament, adjust number of returned verses)

HanClinto commented 11 months ago

Added command-line options to the data creation script to give easier control for testing various Bible translations, models, or other parameters.
Added testament as another metadata parameter for each verse so that we can more easily filter by OT or NT.
Adjusted capitalization in prompt text to change "Religious" to 'religious'
~~Added filters on main page to limit search to OT or NT.~~
- Now removed from here and added via PR #9 .
~~Added slider to control the number of verses returned.~~
- Now removed from here and added via PR #8 .
~~Added fallback mechanism to support limited functionality (verse similarity search only) if no Claude API key is present.~~
- Now removed from here and added via PR #7 .

HanClinto commented 11 months ago

Screenshot of changes to the UI showcasing error message when API key is not present, and new search filters:

HanClinto commented 11 months ago

Hi Clint,

First of all, thank you for taking the time to contribute to the project!

My pleasure, thank YOU for creating such a wonderful application! I appreciate how simple and clean it is from a code perspective, and how it is a very clear application of RAGs being used in a practical manner.

I've reviewed the changes in your pull request, and I appreciate the improvements you've made to the codebase, such as the error handling for the LLM support and the search options for the Old Testament and New Testament.

However, I noticed that the PR includes changes to the database files, which I would prefer not to merge at this time. Could you please resubmit your PR with just the code changes and exclude any modifications to the database (data/db/*)? This would help maintain the integrity of the project's data layer as it currently stands. After merging your changes, I'll re-run the create DB process on my side and re-deploy the changes to biblos.app

I am happy to resubmit my PR, no problem! I should have split the functionality up into multiple PR's from the beginning so that the various features didn't overlap so much. Sloppy on my part!

However, I noticed that the PR includes changes to the database files, which I would prefer not to merge at this time.

So the entire reason that I regenerated the database is so that I can add a testament field to the document metadata. That is what is used by the search filter to limit documents to those that match testament == "OT" or testament == "NT". (note the changes to doc.metadata in create_db.py).

Without those fields being added to the metadata, then the filter won't work. I originally tried figuring out a solution that went purely off of the book names (all joined together with a large $OR filter), but I had trouble getting it to work and eventually just added the testament metadata tag to make it simpler.

But, as you noted, that comes with the downside of needing to completely regenerate the database just to add these metadatas. I didn't change anything else about the database embeddings -- only the metadata -- but I can understand not wanting to merge a commit on such a large binary blob.

I'll think this over -- I may be able to go back to the original solution of building the large $OR query -- now that I'm a bit more familiar with how Chroma works.

Thanks again for your contribution, and I'm looking forward to incorporating your improvements into the project!

Thank YOU!

Side question -- what is your opinion of the slider to adjust the number of results? I'm not sure if I like that solution or not, and I'm not sure what including 10 passages into the summarization request to Anthropic would do for your usage bill, so I probably should have broken that function out into a separate PR as well.

In summary: I'll split the PR into at least three separate ones -- one for the OT / NT filter, one for the adjusting the number of returned results, and one for the LLM access token error handling.

Much appreciated!

Respectfully, Clint

HanClinto commented 11 months ago

Okay, this PR is now isolated to be only the updates to the database generation script, and does not contain the actual output of the data generation step (which is to add the testament metadata tag). I left that portion of the code in here as it doesn't harm anything right now, but it is sitting dormant at the moment. There are enough other improvements to this code and documentation that I think the PR still has some value (?).

dssjon commented 11 months ago

Okay, this PR is now isolated to be only the updates to the database generation script, and does not contain the actual output of the data generation step (which is to add the testament metadata tag). I left that portion of the code in here as it doesn't harm anything right now, but it is sitting dormant at the moment. There are enough other improvements to this code and documentation that I think the PR still has some value (?).

Looks good, thanks!

dssjon / biblos

Data generation command-line options, added additional search options on main page (filter by testament, adjust number of returned verses) #6