Pituchey-Hotam / Genizah

Automatically scan documents for Shemot and prevent printing Genizah
6 stars 2 forks source link

Smarter matching using Syntactic Analysis #3

Open michael-3-141 opened 1 year ago

michael-3-141 commented 1 year ago

There are some words which have identical spelling to names, and simple text based matching can't distinguish. For example:

אתפללה אל אל ערבית ושחרית. דני הלך אל הגן שדי חמד

The first and third אל would be wrongly matched, as well as the word שְׂדֵי. Distinguishing between these words requires syntactic understanding of the Hebrew language.

Dicta is an organization specializing in Hebrew linguistic tools. They have tools that have this understanding, and were open to providing us with an API. The API is private (we have a dedicated API key), so we would need our own server to serve as a proxy between clients and the Dicta API.

Integrating this technology would give the plugin a real advantage over manual searching, especially for long documents.

michael-3-141 commented 11 months ago

תודרה רבה ל @NoamShveber שעשה התחלה טובה בpull request #9 . אלה הצעדים הבאים שסיכמנו עליהם בפגישה: