Namit2111 / bible-verse-finder

https://bible-verse-finder.vercel.app
GNU General Public License v3.0
21 stars 30 forks source link

Entire bible parsed for clustering #41

Closed rabelmervin closed 3 weeks ago

rabelmervin commented 3 weeks ago

Description

This PR updates the project to use the entire Bible for clustering, improving the accuracy and reducing the number of 0% similarity results. Additionally, the data storage has been reorganized, so each book stores its own chapters, which store their own scriptures. The printing of clusters now includes the book name with the verse number (e.g., Matthew 5:38), replacing the manual concatenation of 1 John.

Related Issues

Fixes #123 (Bible clustering issue) Related to #124 (Data storage improvement)

Changes List

✅Updated the clustering to use the entire Bible instead of 1 John.

✅Reorganized data storage for the Gutenberg Bible, structuring by books, chapters, and scriptures.

✅ Modified cluster printing to include the book name attached to the verse number.

Type of Changes

✅Bug fix (fixes an existing issue)

✅Enhancement (improves or changes existing functionality)

Checklist

✅ My code follows the style guidelines of this project.

✅ I have performed a self-review of my code.

✅ I have commented my code, particularly in hard-to-understand areas.

✅ I have made corresponding changes to the issue .

✅ New and existing unit tests pass locally with my changes.

vercel[bot] commented 3 weeks ago

@rabelmervin is attempting to deploy a commit to the namit2111's projects Team on Vercel.

A member of the Team first needs to authorize it.

JustinhSE commented 3 weeks ago

@rabelmervin can you show that on your local host this outputs the correct response?

rabelmervin commented 3 weeks ago

Sure @JustinhSE Screenshot (54)

JustinhSE commented 3 weeks ago

@rabelmervin no I meant can you send a screenshot of what your changes look like on your local host

JustinhSE commented 3 weeks ago

Like the output similar to what a user would see

rabelmervin commented 3 weeks ago

@JustinhSE cant able to run could you please tell me how to run it without error ?

Screenshot (55) Screenshot (56)

JustinhSE commented 3 weeks ago

@rabelmervin review the comments here first and see if it resolves......but the debugging is for you to do as you are making the changes.

JustinhSE commented 2 weeks ago

@rabelmervin the deadline for this issue to be eligible for a badge is coming up in the next few days. Feel free to work on this and open another PR

rabelmervin commented 2 weeks ago

Sure ,Iam happy to work on this issue sir @JustinhSE . Could you be please guide me more ?

JustinhSE commented 2 weeks ago

Unfortunately not, although I lead along with Namit, these issues are supposed to be completed by the user. You should be asking your teammate for help as well

rabelmervin commented 1 week ago

hi @JustinhSE @Namit2111 I ran it on local host what you think about this ? Screenshot (1)

JustinhSE commented 1 week ago

So the only thing is, that doesn’t print the book name before the chapter and verse. That’s the only change needed @rabelmervin

rabelmervin commented 1 week ago

hi @JustinhSE, @Namit2111 I think now its alright !. Your thoughts ? Screenshot (2)

JustinhSE commented 1 week ago

yes and no. Yes but why do some not show the #:# @rabelmervin ?

rabelmervin commented 1 week ago

yes and no. Yes but why do some not show the #:# @rabelmervin ?

Hi @JustinhSE The problem occurs because, each verses are seperated by \n but, also In some verses within a verse there is a line \n

Screenshot (3)

rabelmervin commented 6 days ago

hi @JustinhSE, what you think about the problem ? can i make pr ?

JustinhSE commented 6 days ago

Sorry missed your message @rabelmervin . Yes but I will be looking for alternatives then to parse the bible

rabelmervin commented 3 days ago

Hi @JustinhSE is there any effective way to implement it ?

JustinhSE commented 3 days ago

@rabelmervin right now, I don’t think so…. The only way forward I can think of is fetching the Bible or downloading a csv file and storing that… tbd on this

JustinhSE commented 3 days ago

@rabelmervin so I just uploaded a version of the complete bible broken down by book, chapter and verses. check backend/utils/bible.json and see if you could potentially use that instead. It is more clean and concise and retrieving verses can be easier.

JustinhSE commented 3 days ago

@rabelmervin try an alteration of this code so that it factors well into our code

from sentence_transformers import SentenceTransformer, util
import json
import heapq

# Load the pre-trained BERT model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Load the Bible JSON data
with open('bible.json', 'r') as file:
    bible_data = json.load(file)

# Extract all verses and their information
verses = []
verse_info = []
for verse in bible_data['verses']:
    verses.append(verse['text'])
    verse_info.append({
        'book_name': verse['book_name'],
        'chapter': verse['chapter'],
        'verse': verse['verse'],
        'text': verse['text']
    })

# Encode all verses (this step might take some time)
verse_embeddings = model.encode(verses, convert_to_tensor=True)

def find_similar_verses(theme, top_k=20):
    # Encode the input theme
    theme_embedding = model.encode(theme, convert_to_tensor=True)

    # Calculate cosine similarities
    similarities = util.pytorch_cos_sim(theme_embedding, verse_embeddings)[0]

    # Get top-k similar verses
    top_results = heapq.nlargest(top_k, enumerate(similarities), key=lambda x: x[1])

    results = []
    for idx, score in top_results:
        verse_data = verse_info[idx]
        result = {
            'reference': f"{verse_data['book_name']}, {verse_data['chapter']}:{verse_data['verse']}",
            'text': verse_data['text'],
            'score': float(score),
            'book_name': verse_data['book_name'],
            'chapter': verse_data['chapter'],
            'verse': verse_data['verse']
        }
        results.append(result)

    return results
JustinhSE commented 3 days ago

must do this tho pip install sentence-transformers

rabelmervin commented 2 days ago

Thanks @JustinhSE excited to work on this!