Machine Learning/ Data Science -Parse Entire Bible instead of John

JustinhSE commented 3 weeks ago

Overview 🤔

We need to update the project to use the entire Bible for clustering instead of just 1 John. This will reduce the number of 0% similarity results.

Task 🔧

Update the project to use the entire Bible for clustering to improve accuracy
Reorganize the data storage for the Gutenberg Bible so that each book stores its own chapters, which in turn store their own scripture
Modify the printing of the clusters to include the book name attached to the verse number (e.g., Matthew 5:38 instead of 5:38)
Currently, we manually concatenate 1 John onto the verse, which needs to be removed

rabelmervin commented 2 weeks ago

Hi @JustinhSE , I'm interested in solving this issue. Could you please provide me the location of the project code.

JustinhSE commented 2 weeks ago

For sure @rabelmervin !

JustinhSE commented 2 weeks ago

@rabelmervin this is our Python app doing most of the work for this issue

rabelmervin commented 2 weeks ago

Hi sir @JustinhSE, I have seperated book name and content using indices I have researched but, I can't able seperate chapters could you please tell me the way to do it ? Screenshot (49)

JustinhSE commented 2 weeks ago

@rabelmervin can you send me your Jupyter notebook?

rabelmervin commented 2 weeks ago

@rabelmervin can you send me your Jupyter notebook? sure sir @JustinhSE https://colab.research.google.com/drive/14pA3UIGjorODH_kqJHs3rNAMTGzngMoO?usp=sharing

unclebinary1001 commented 2 weeks ago

hi @JustinhSE , i am interested in this issue and I would like to collaborate with @rabelmervin seeing that this issue has several tasks in it. thank you for your consideration.

JustinhSE commented 2 weeks ago

@rabelmervin would you want to coauthor with @unclebinary1001 on this issue?

I am currently in midterm season so it may be hard for me to work immediately on this, however you 2 can work together and see if you can rework it?

JustinhSE commented 2 weeks ago

@rabelmervin can you send me your Jupyter notebook?

sure sir @JustinhSE https://colab.research.google.com/drive/14pA3UIGjorODH_kqJHs3rNAMTGzngMoO?usp=sharing

Try to first use the .ipynb notebook we derived the app from first and try working from their for a better sandbox

rabelmervin commented 2 weeks ago

@rabelmervin would you want to coauthor with @unclebinary1001 on this issue?

I am currently in midterm season so it may be hard for me to work immediately on this, however you 2 can work together and see if you can rework it?

Yeah sir @JustinhSE I'm really excited to collaborate with @unclebinary1001

JustinhSE commented 2 weeks ago

Great 🚀🚀🚀

Assigning you now @unclebinary1001

JustinhSE commented 1 week ago


nltk.download('gutenberg')
from nltk.corpus import gutenberg
bible = gutenberg.raw('bible-kjv.txt')

# split all of the books of the bible into an array
books = bible.split('\n\n\n\n\n')
print(books)```

This will help you both, this is from my original script of parsing the bible from the NLTK. Try this out and see how the book titles and chapters and verses are shown. (Hint: Within the string, it says 1:2 (whatever the verse says), so try to abstract those 2 verses and try to trim down the book titles to only Genesis for example and not the rest provided) @unclebinary1001 @rabelmervin

unclebinary1001 commented 1 week ago

Thank you for adding me to this task, @JustinhSE. I look forward to collaborating with you, @rabelmervin.

rabelmervin commented 1 week ago

Can we use this to access entire bible?

for i in range(len(books)): book = books[i] chapter = book.split('\n\n')

@JustinhSE , @unclebinary1001 could you please tell me how i can do like this (Matthew 5:38 instead of 5:38)

JustinhSE commented 1 week ago

So analyze how 5:38 is shown and how the book chapter is shown as well. Remember the books should be divided before you get to the chapters and verses

So what you could do is books[i] (title) then the verse output we currently have

JustinhSE commented 1 week ago

But the point of this issue is to reorganize how we are storing the Bible, instead maybe use a data structure that could do book -> chapters -> verses

JustinhSE commented 1 week ago

@rabelmervin look into this and instead use bible-kjv.text

https://www.nltk.org/book/ch02.html

JustinhSE commented 5 days ago

@rabelmervin @unclebinary1001 updates?

rabelmervin commented 4 days ago

Extremely apologize sir @JustinhSE I currently preparing for semester exams but, I'll definitely make pr within this week.

JustinhSE commented 4 days ago

Alright thanks @rabelmervin , updates @unclebinary1001 ?

maskeensingh commented 3 days ago

hi @JustinhSE i think i can contribute in this maybe alone or as well as with the @unclebinary1001 and @rabelmervin . I have a good knowledge about ML and I think i can do the task. I am late because i just got to know about hacktober fest. Please assign me also, maybe not whole if you want but a portion in this that is left uncovered by the persons whom you early assigned

Namit2111 commented 3 days ago

Hi @maskeensingh not a lot of progress has been done in this issue, if you can discuss your approach in solving this or how you will move towards solving it we will assign it to you.

maskeensingh commented 3 days ago

ok @Namit2111 i will send you a detailed idea and workflow

Namit2111 / bible-verse-finder

Machine Learning/ Data Science -Parse Entire Bible instead of John #1

Overview 🤔

Task 🔧