STEPBible / STEPBible-Data

Data created for www.STEPBible.org, available to other projects under CC BY 4.0
107 stars 35 forks source link

Badly formatted data in TTESV #40

Open dlee opened 3 years ago

dlee commented 3 years ago

There are some lines in TTESV that do not conform to the specified format. I couldn't really figure out how to fix the errors, but they seem to generally fall within the lines of the word index having a +00 and then a long list of strongs numbers.

Some examples:

$Num 1:43   02=<06485>  05=<04294>  07=<05321>  53+00=<07969>+<02572>+<00505>+<00702>+<03967>   
$Num 2:30   02=<06635>  04=<06485>  53+00=<07969>+<02572>+<00505>+<00702>+<03967>   
$Num 4:44   02=<06485>  04=<04940>  3+00=<07969>+<00505>+<03967>    
$Num 26:47  01=<00428>  04=<04940>  07=<01121>  09=<00836>  13=<06485>  53+00=<07969>+<02572>+<00505>+<00702>+<03967>   
$Num 26:62  03=<06485>  23+00=<07969>+<06242>+<00505>   05=<03605>  06=<02145>  09=<02320>  10=<01121>  12=<04605>  13=<03588>  16=<03808>  17=<06485>  18=<08432>  20=<01121>  22=<03478>  23=<03588>  26=<03808>  27=<05159>  28=<05414>  31=<08432>  33=<01121>  35=<03478>  
$Jdg 15:11  3+00=<07969>+<00505>    02=<00376>  04=<03063>  05+06=<03381>   09=<05585>  12=<05553>  14=<05862>  16=<00559>  18=<08123>  22=<03045>  25=<06430>  27=<04910>  30=<04100>  33=<02088>  37=<06213>  42=<00559>  47=<06213>  53=<06213>  
$Jdg 16:27  03=<01004>  05=<04390>  07=<00582>  09=<00802>  10=<03605>  12=<05633>  15=<06430>  17=<08033>  21=<01406>  3+00=<07969>+<00505>    25=<00376>  27=<00802>  29=<07200>  32=<08123>  33=<07832>  
$1Ki 4:32   03=<01696>  3+00=<07969>+<00505>    04=<04912>  07=<07892>  1+05=<00505>+<02568>    
$1Ki 5:16   01=<00905>  02+03=<08010>   3+00=<07969>+<00505>+<07969>+<03967>    04=<08269>  05=<05324>  06=<00834>  08=<05921>  10=<04399>  13=<07287>  16=<05971>  18+19=<06213>   21=<04399>  
$1Ch 12:27  02=<05057>  03=<03077>  08=<00175>  3+00.   <07969>+<00505>+<07651>+<03967> 
$1Ch 12:29  03=<01121>+<01144>  05=<00251>  07=<07586>  3+00=<07969>+<00505>    11=<04768>  16=<08104>  18=<04931>  21=<01004>  23=<07586>  
$1Ch 29:4   3+00=<07969>+<00505>    01=<03603>  03=<02091>  06=<02091>  08=<00211>  7+00=<07651>+<00505>    10=<03603>  12=<02212>  13=<03701>  15=<02902>  17=<07023>  20=<01004>  
$2Ch 2:2    02=<08010>  03=<05608>  70+00=<07657>+<00505>   04=<00376>  06+07=<05449>   80+00=<08084>+<00505>+<00376>   10=<02672>  13+14=<02022>   3+00=<07969>+<00505>+<08337>+<03967>    17=<05329>  
$2Ch 2:17   02=<08010>  03=<05608>  04=<03605>  06+07=<00582>+<01616>   08=<00834>  12=<00776>  14=<03478>  15=<00310>  17=<05610>  21=<01732>  23=<00001>  25=<05608>  29=<04672>  153+00=<03967>+<02572>+<00505>+<07969>+<00505>+<08337>+<03967>  
$2Ch 2:18   01=<07657>  02=<00505>  06=<06213>  08+09=<05449>   80+00=<08084>+<00505>   11=<02672>  14+15=<02022>   3+00=<07969>+<00505>+<08337>+<03967>    18=<05329>  22=<05971>  23=<05647>  
$2Ch 4:5    02=<05672>  05=<02947>  08=<08193>  10=<04639>  13=<08193>  16=<03563>  19=<06525>  22=<07799>  24=<02388>+<03557>  3+00=<07969>+<00505>    25=<01324>  
$2Ch 25:13  03=<01121>  06=<01416>  08=<00558>  10=<07725>  14=<01980>  18=<04421>  19=<06584>  21=<05892>  23=<03063>  25=<08111>  27+28=<01032>   30+31=<05221>   3+00=<07969>+<00505>    36=<00962>  37=<07227>  38=<00961>  
$2Ch 29:33  03+04=<06944>   600=<08337>+<03967> 06=<01241>  3+00=<07969>+<00505>    08=<06629>  
$2Ch 35:7   02=<02977>  03=<07311>  06=<01121>  07=<05971>  09+10=<06453>   12=<03605>  15=<04672>  16=<03532>  18=<01121>  19=<05795>  20=<04480>  22=<06629>  25=<04557>  30+00=<07970>+<00505>   3+00=<07969>+<00505>    28=<01241>  29=<00428>  31=<04480>  33+34=<04428>   35=<07399>  
$Job 1:3    02=<04735>  7+00=<07651>+<00505>    03=<06629>  3+00=<07969>+<00505>    04=<01581>  500=<02568>+<03967> 05=<06776>  07=<01241>  500=<02568>+<03967> 09+10=<00860>   12=<03966>  13=<07227>  14=<05657>  18=<00376>  21=<01419>  23=<03605>  25=<01121>  28=<06924>  

There's also this line that has a word index of 601:

$Num 26:51  04=<06485>  07=<01121>  09=<03478>  601+30=<08337>+<03967>+<00505>+<00505>+<07651>+<03967>+<07970>

There's also a line that has an invalid strongs number (0100419):

$2Sa 15:17  03=<04428>  04+05=<03318>   07=<03605>  09=<05971>  10=<07272>  14=<05975>  17=<04801>  18=<01004>   <0100419+04801>

I think the last entry is supposed to be 19=<01004+04801>

DavidIB commented 3 years ago

Thanks for taking time to point these out. This dataset is due for a complete revamp. In the future I plan to link to the individual tagged words in ESV - ie I'll avoid copyright issues by not including in-between untagged words. The dataset is also being updated to tagging that includes all Hebrew prefixes and suffixes. This means, in the short term, I won't be fixing these issues. Sorry!

dlee commented 3 years ago

Thank you for the update. Do you have an estimated timeline for the updated dataset?

DavidIB commented 3 years ago

ASAP

David IB

On Sat, Mar 13, 2021 at 11:56 AM David Lee @.***> wrote:

Thank you for the update. Do you have an estimated timeline for the updated dataset?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tyndale/STEPBible-Data/issues/40#issuecomment-798201812, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAM5BOS3YZ42L3CL35NJ4U3TDM5KDANCNFSM4ZDNIOOQ .