Create parser for ramban (mefarshim)

omishali commented 6 years ago

A raw file was added under "ramban". Note there are some "הקדמה" elements that should be treated as well (think of a proper uri).

orelkan commented 6 years ago

@omishali I uploaded a Parser and json file to jbs-data, please take a look. Specifically, tell me if the format of the info on the text elements is the way you want it. If it isn't I'll change it. Here's an example text element.

{
      "jbo:within": [

        "jbr:section-tanach-1-49-18"

      ],

      "jbo:book": "jbr:book-ramban",

      "uri": "jbr:text-ramban-tanach-1-49-18",

      "jbo:position": "1359",

      "rdfs:label": "פירוש רמב\"ן בראשית מט יח",

      "jbo:interprets": "jbr:text-tanach-1-49-18",

      "jbo:name": "רמב\"ן",

      "jbo:text": "לישועתך קויתי ה' לא היה בכל שופטי ישראל מי שנפל ביד אויביו זולתי שמשון שהוא הנחש הזה כדכתיב (שופטים ב יח) והיה ה' עם השופט והושיעם מיד אויביהם כל ימי השופט והוא היה האחרון לשופטים כי שמואל נביא היה ולא נלחם להם ובימיו מלכו המלכים וכאשר ראה הנביא תשועת שמשון כי נפסקה אמר לישועתך קויתי ה' לא לישועת נחש ושפיפון כי בך אושע לא בשופט כי תשועתך תשועת עולמים"

    }

Thanks

omishali commented 6 years ago

Comments:

label: remove "פירוש"
bug in within: you want to link to the perek not the pasuk uri. BUT there is no need to within here at all, so remove this property (the current book has no sections).
about the hakdamot: actually at the beginning of the book with have 2 hakdamot, one is "הקדמה" and the other is "פתיחה לפירוש התורה". So each of them should have a URI of its own (ramban-tanach-1-0, ramban-tanach-1-1).
Note that ספר במדבר and ספר דברים have hakdamot as well. They are currently not properly parsed (added to the previous elements).

orelkan commented 6 years ago

Ok I will remove the "within" property. I assume this is general for all Mefarshim correct?

About the 3rd bullet, if "פתיחה לפירוש התורה" has URI ramban-tanach-1-1, that clashes with book 1 perek 1 which has the same URI. Or do you think book 1 perek 1 should have URI ramban-tanach-1-2? I think that would make it more confusing. Maybe Hakdama properties will have URI ramban-tanach-1-0-X, when 0 means it's hakdama. In Bereshit this would make the first hakdama 1-0-1 and "פתיחה לפירוש התורה" number 1-0-1. Is this acceptable? I just want to be clear on what you prefer. Thanks

About the 4th bullet, they seem parsed fine. The "במדבר" book simply has the same last line the previous verse has the hakdama, and only the last line is similiar. You can see this in the raw text. In "דברים" I can't see a place where the hakdama is added to the previous element.

omishali commented 6 years ago

Yes, the same for all mefarshim.
Where is the clash? all "normal" URIs has the form "text-ramban-tanach-x-y-z". Only the hakdamot have a "text-ramban-tanach-x-y" form.

orelkan commented 6 years ago

My edit about the 4th bullet is new so I'll write it again in this comment: About the 4th bullet, they seem parsed fine. The "במדבר" book simply has the same last line the previous verse has the hakdama, and only the last line is similiar. You can see this in the raw text. In "דברים" I can't see a place where the hakdama is added to the previous element.

About the clash, I see now what you mean. The other URIs also have the pasuk property so they are triplets (book, perek, pasuk). The hakdama don't have pasuk property so they will be pairs. So you would prefer if "פתיחה לפירוש התורה" had URI ramban-tanach-1-1?

orelkan commented 6 years ago

Updated Parser and json file in jbs-data with the comments. This is currently the "בראשית פתיחה לפירוש התורה" text element. I only wasn't sure about what should i put on the "interprets" quality, since it doesn't interpret bereshit perek 1 (so 1-1 wouldn't fit).

"jbo:book": "jbr:book-ramban",
      "uri": "jbr:text-ramban-tanach-1-1",
      "jbo:position": "2",
      "rdfs:label": "רמב\"ן בראשית פתיחה לפירוש התורה",
      "jbo:interprets": "jbr:text-tanach-1-0",
      "jbo:name": "רמב\"ן",
      "jbo:text": "משה רבנו כתב הספר הזה עם התורה כולה מפיו של הקב\"ה. והקרוב שכתב זה בהר סיני, כי שם נאמר לו: \"עלה אלי ההרה והיה שם, ואתנה לך את לוחות האבן........."

Didn't put all the text because it's too long. Is this format ok?

omishali commented 6 years ago

Looks OK. for the hakdamot no need for any interprets attribute.

orelkan commented 6 years ago

OK so on the hakdamot I will remove the interprets property.

So I will write a few tests and I assume I'm done with RambanParser. I'm waiting on the files of the other mefarshim

TechnionTDK / jbs-text2json

Create parser for ramban (mefarshim) #50