avendesora / pythonbible

A python library for validating, parsing, normalizing scripture references and retrieving scripture texts (for open source and public domain versions)
https://docs.python.bible
MIT License
56 stars 11 forks source link

Potential issue with duplicated verses in references #169

Open christopherpickering opened 2 months ago

christopherpickering commented 2 months ago

Thanks again for this, I'm getting a lot of use :)

I found another potential issue. When a set of text has repeating references, they are grouped by book, but it seems not by chapter or verse. Meaning that the formatted output has duplicate verses. Here's an example, notice the refs for John and Hebrews for dupe verses, and Genesis for dupe chapter. I added a dupe remove on the verse id's to get the output I was expecting:

import pythonbible as bible

text ='Jeremiah 10:11-12;John 1:1;Hebrews 1:8-12;Genesis 1:1,2:4,2:7;Malachi 3:18;John 1:1;Psalms 33:6,9,136:5;John 1:1-3;Colossians 1:16-17;Hebrews 1:8-10,11:3'

references = bible.get_references(text)
formatted = bible.format_scripture_references(references)

print(formatted)

# list and set to remove dups
verse_ids = list(set(bible.convert_references_to_verse_ids(references)))

new_references = bible.convert_verse_ids_to_references(verse_ids)
formatted_2 = bible.format_scripture_references(new_references)

print(formatted_2)

output:

# initial output
Genesis 1:1,2:4,2:7;Psalms 33:6,9,136:5;Jeremiah 10:11-12;Malachi 3:18;John 1:1,1,1-3;Colossians 1:16-17;Hebrews 1:8,8-9,9-10,10-12,11:3

# with dupes manually removed list(set(...)) on verse_ids
Genesis 1:1,2:4,2:7;Psalms 33:6,9,136:5;Jeremiah 10:11-12;Malachi 3:18;John 1:1-3;Colossians 1:16-17;Hebrews 1:8-12,11:3
avendesora commented 1 month ago

Thanks for reporting this and sorry for the delay. It's been a busy summer. This is something we should fix, and I'll hopefully be able to start on that soon.

christopherpickering commented 1 month ago

Thanks!

Since then I found another possible issue (or preference :) ), you can see it in the 2nd output:

# with dupes manually removed list(set(...)) on verse_ids
Genesis 1:1,2:4,2:7;Psalms 33:6,9,136:5;Jeremiah 10:11-12;Malachi 3:18;John 1:1-3;Colossians 1:16-17;Hebrews 1:8-12,11:3

# the first ref should maybe be..?... 
Genesis 1:1,2:4,7

Nested commas get tricky tho. I just did a regex in my out and swapped commas to semi colons, then added in the comma between verses in the same ch.