Open MDjavaheri opened 4 years ago
Thanks for the feedback.
We certainly can be reducing the number of false matches, and catching more that we don't yet catch at all. I don't agree with that lookbehind strategy, though. We have many (most?) of the books cited this way, and all of the ones mentioned as examples. We should aim to capture as many as we can. The work of maintaining a "don't match" list seems like giving up on those.
Many of these come down to us being over reliant on commas. That's the long hanging fruit.
For example, and relating to your example - Kaf HaChaim, Orach Chaim 47:34
does match, but Kaf HaChaim Orach Chaim 47:34
does not.
I've added these example to a test document that we use: https://github.com/Sefaria/Sefaria-Project/blob/master/data/linker_test.html
Thanks for the response.
I hear your point about the lookbehind.
Wow, what a difference a comma can make! ישועת ה' כהרף עין! At the same time, two issues arise. Commas can get numerous, so parenthesis like Shulchan Aruch (Orach Chaim 47:1)
can be preferred, and, while that captures the Kaf HaChaim part, one still has to add a :1
to the end (Kaf HaChaim, Orach Chaim 47:34:1
to get it to show properly.
Also, @ikesultan notes how Kaf HaChaim 46:49:1
is live on https://halachipedia.com/index.php?title=Birchot_HaShachar#cite_note-48, but I would point out this is ambiguous. Kaf HaChaim covers the first 119 Simanim of Yoreh Deah also.
And stam, aliases for different spellings of a sefer name will be helpful for things like Mishna Brurah
vs. Mishna Berura
vs. Mishnah Berurah
. Same for HaChayim
and HaChaim
Regarding aliases, the linker already takes into account aliases. The only issue is finding all of the aliases for a given book. In your example above, we have Mishna Brurah as an alias but not Mishna Berura.
Regarding your original point, as @EliezerIsrael mentioned, the linker is currently a bit rigid. I misses certain obvious deviations from the format we're expecting that humans easily recognize. This is something we hope to fix in the future, although the exact solution isn't obvious.
Ok, thank you, Noah. תזכו למצוות!
I'm happy to post more suggestions here if you guys are open to it.
We're happy to have more suggestions! Feel free to open other issues as relevant.
Mishneh Torah sections could use aliases, for example Hilchos Tefillin (4:10) - instead of Tefillin, Mezuzot, and the Torah Scroll. Maaser instead of Tithes, etc. Just transliterate the Hebrew ones.
Same with parenthesis Mishneh Torah (Hilchos Tefillin 4:10)
Obviously Hilchos vs. Hilchot
Ben Ish Hai, Halachot 1st Year, Bereshit, Chapter 2 = Ben Ish Chai (Shanah Rishonah, Bereshit 2), Ben Ish Chai (I Bereshit 2)
Drisha, Prisha = Derisha, Perisha
Darchei Moshe = Darkei Moshe
Ba'er Hetev = Be'er Heitev
Me'irat Einayim = Sma
Pithei Teshuva = Pischei Teshuvah, Pitchei Teshuvah
Siftei Kohen = Shach
Turei Zahav = Taz
Dagul MeRavava (is that even correct?) = Dagul Mervava
Nekudat HaKesef = Nekudot HaKasef (that's the correct spelling)
Basically, the linker could be optimized with aliases and more accurate targeting to more correctly link sources to Sefaria. Many times, Sefarim on Shas and Shulchan Aruch are not linked correctly, because the linker does not target anything more than the Masechet and Daf Number/Chelek of Shulchan Aruch and Siman:Seif. Plus, there are some sefarim that just don't get linked.
When citing a sefer that follows the order of Shulchan Aruch, such as "Kitzur Shulchan Aruch Orach Chaim 7:1" the linker always assumes the sefer is Shulchan Aruch, and only includes "Orach Chaim 7:1" in the link. Similarly, Kaf HaChaim Orach Chaim 47:34, only links from Orach Chaim and on, leading to an incorrect link.
Could someone rewrite the citation regexes to account for situations like this? Meaning, include some lookbehinds to ignore matches that have certain sefer names before them. For example:
(?<!=Kitzur|Aruch HaShulchan|Kaf HaChaim|Taz|Be'er Heitev)\s?(Shulchan Aruch )?(Orach Chaim|Yoreh Deah|Even HaEzer|Choshen Mishpat) \d+(:\d+)+
Cases to consider would be:
The same is true for Mefarshei HaShas, such as Tosafot Berachot 11a.
It would also be great if the linker could be updated to support the following citation formats.
Take a look at https://halachipedia.com/index.php?title=Birchot_HaTorah for plenty of examples.