Closed omishali closed 6 years ago
@omishali I uploaded a Parser and json file to jbs-data, please take a look. Specifically, tell me if the format of the info on the text elements is the way you want it. If it isn't I'll change it. Here's an example text element.
{
"jbo:within": [
"jbr:section-tanach-1-49-18"
],
"jbo:book": "jbr:book-ramban",
"uri": "jbr:text-ramban-tanach-1-49-18",
"jbo:position": "1359",
"rdfs:label": "פירוש רמב\"ן בראשית מט יח",
"jbo:interprets": "jbr:text-tanach-1-49-18",
"jbo:name": "רמב\"ן",
"jbo:text": "לישועתך קויתי ה' לא היה בכל שופטי ישראל מי שנפל ביד אויביו זולתי שמשון שהוא הנחש הזה כדכתיב (שופטים ב יח) והיה ה' עם השופט והושיעם מיד אויביהם כל ימי השופט והוא היה האחרון לשופטים כי שמואל נביא היה ולא נלחם להם ובימיו מלכו המלכים וכאשר ראה הנביא תשועת שמשון כי נפסקה אמר לישועתך קויתי ה' לא לישועת נחש ושפיפון כי בך אושע לא בשופט כי תשועתך תשועת עולמים"
}
Thanks
Comments:
Ok I will remove the "within" property. I assume this is general for all Mefarshim correct?
About the 3rd bullet, if "פתיחה לפירוש התורה" has URI ramban-tanach-1-1, that clashes with book 1 perek 1 which has the same URI. Or do you think book 1 perek 1 should have URI ramban-tanach-1-2? I think that would make it more confusing. Maybe Hakdama properties will have URI ramban-tanach-1-0-X, when 0 means it's hakdama. In Bereshit this would make the first hakdama 1-0-1 and "פתיחה לפירוש התורה" number 1-0-1. Is this acceptable? I just want to be clear on what you prefer. Thanks
About the 4th bullet, they seem parsed fine. The "במדבר" book simply has the same last line the previous verse has the hakdama, and only the last line is similiar. You can see this in the raw text. In "דברים" I can't see a place where the hakdama is added to the previous element.
My edit about the 4th bullet is new so I'll write it again in this comment: About the 4th bullet, they seem parsed fine. The "במדבר" book simply has the same last line the previous verse has the hakdama, and only the last line is similiar. You can see this in the raw text. In "דברים" I can't see a place where the hakdama is added to the previous element.
About the clash, I see now what you mean. The other URIs also have the pasuk property so they are triplets (book, perek, pasuk). The hakdama don't have pasuk property so they will be pairs. So you would prefer if "פתיחה לפירוש התורה" had URI ramban-tanach-1-1?
Updated Parser and json file in jbs-data with the comments. This is currently the "בראשית פתיחה לפירוש התורה" text element. I only wasn't sure about what should i put on the "interprets" quality, since it doesn't interpret bereshit perek 1 (so 1-1 wouldn't fit).
"jbo:book": "jbr:book-ramban",
"uri": "jbr:text-ramban-tanach-1-1",
"jbo:position": "2",
"rdfs:label": "רמב\"ן בראשית פתיחה לפירוש התורה",
"jbo:interprets": "jbr:text-tanach-1-0",
"jbo:name": "רמב\"ן",
"jbo:text": "משה רבנו כתב הספר הזה עם התורה כולה מפיו של הקב\"ה. והקרוב שכתב זה בהר סיני, כי שם נאמר לו: \"עלה אלי ההרה והיה שם, ואתנה לך את לוחות האבן........."
Didn't put all the text because it's too long. Is this format ok?
Looks OK. for the hakdamot no need for any interprets attribute.
OK so on the hakdamot I will remove the interprets property.
So I will write a few tests and I assume I'm done with RambanParser. I'm waiting on the files of the other mefarshim
A raw file was added under "ramban". Note there are some "הקדמה" elements that should be treated as well (think of a proper uri).