Closed tcouch closed 2 weeks ago
We could go for something like ~m for a specific number of intervening words or ~m:n to set an upper and lower bound. So "Astra inclinant, sed non obligant" could be matched by:
@rmamarshall would that scheme work for you?
Something like the following regular expression could do this:
(keyword1)\s*(?:\w+\s+){m,n}(keyword2)
The following django query matches Fragment 188:
OriginalText.objects.filter(content__iregex="Homeri\s*(?:\w+\s+){1,2}maxime")
@tcrouch This looks very workable, provided it also finds incomplete words, i.e. Astra ~: obligant will also return astram ... obligantur
Sent from my Galaxy
-------- Original message -------- From: tcouch @.> Date: 25/11/2021 11:01 (GMT+00:00) To: UCL/frrant @.> Cc: rmamarshall @.>, Mention @.> Subject: Re: [UCL/frrant] Proximity search with tilde character (Issue #299)
We could go for something like ~m for a specific number of intervening words or ~m:n to set an upper and lower bound. So "Astra inclinant, sed non obligant" could be matched by:
@rmamarshallhttps://github.com/rmamarshall would that scheme work for you?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/UCL/frrant/issues/299#issuecomment-979095766, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASDKGOFPZTUZJCTJQR5Y4ZTUNYJSBANCNFSM5IKB6SNA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Following the introduction of wildcard search #65, this should entail relatively straightforward extension of the regex-based search we've introduced.
From #297 "Another useful feature would be the ability to run proximity searches (i.e. find occurrences of two or more words within a certain number of words / characters of one another; I think PHI Latin do this using the ~ symbol, e.g. sed~et finds instances of these words within a set number of characters [and in that order]"