UCL / frrant

2 stars 0 forks source link

Proximity search with tilde character #299

Closed tcouch closed 2 weeks ago

tcouch commented 2 years ago

From #297 "Another useful feature would be the ability to run proximity searches (i.e. find occurrences of two or more words within a certain number of words / characters of one another; I think PHI Latin do this using the ~ symbol, e.g. sed~et finds instances of these words within a set number of characters [and in that order]"

tcouch commented 2 years ago

We could go for something like ~m for a specific number of intervening words or ~m:n to set an upper and lower bound. So "Astra inclinant, sed non obligant" could be matched by:

@rmamarshall would that scheme work for you?

tcouch commented 2 years ago

Something like the following regular expression could do this: (keyword1)\s*(?:\w+\s+){m,n}(keyword2) The following django query matches Fragment 188: OriginalText.objects.filter(content__iregex="Homeri\s*(?:\w+\s+){1,2}maxime")

rmamarshall commented 2 years ago

@tcrouch This looks very workable, provided it also finds incomplete words, i.e. Astra ~: obligant will also return astram ... obligantur

Sent from my Galaxy

-------- Original message -------- From: tcouch @.> Date: 25/11/2021 11:01 (GMT+00:00) To: UCL/frrant @.> Cc: rmamarshall @.>, Mention @.> Subject: Re: [UCL/frrant] Proximity search with tilde character (Issue #299)

We could go for something like ~m for a specific number of intervening words or ~m:n to set an upper and lower bound. So "Astra inclinant, sed non obligant" could be matched by:

@rmamarshallhttps://github.com/rmamarshall would that scheme work for you?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/UCL/frrant/issues/299#issuecomment-979095766, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASDKGOFPZTUZJCTJQR5Y4ZTUNYJSBANCNFSM5IKB6SNA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

tcouch commented 2 years ago

Following the introduction of wildcard search #65, this should entail relatively straightforward extension of the regex-based search we've introduced.