Closed valencik closed 7 months ago
Hi @valencik, I am a beginner to the world of open-source contribution and wanted to get started by contributing to this issue. I found out about protosearch through Google Summer of Code and found the scaladoc search project interest. I hope you can guide me through the PR process, as I have implemented what I believe is Regex query support in the forked branch referenced above.
Hey @VigneshSK17, thanks so much for your interest and contribution!
Heads up: I have possibly just created some merge conflicts for you by merging a new PR that touches some of the same areas you've changed. Hopefully you can merge latest main
into your work without too much trouble. But let me know if you need a hand.
Some thoughts on your approach here:
regex creation. There's no validation in Lucille on the correctness of the regex pattern, so it's possible to have the regex creation (calling .r
on the string) fail by throwing a PatternSyntaxException
. We should wrap the creation of the Regex
in a try
/catch
and return a Left
with a descriptive error message if it does fail. This can all still happen in your regexSearch
method in IndexerSearcher
.
TermDictionary. Once we have a valid Regex
, I think we could put the term matching inside TermDictionary
similar to what we do with termsForPrefix
. Perhaps in a termsForRegex(regex: Regex): List[String]
method.
tests. Can we add a test with a string that fails regex compilation and assert we get a Left
. "[a"
is an example of a failing string.
Let me know if you have any questions. And feel free to open a PR now and we can continue discussion there. :)
I created a PR #188 and will work on fixing the merge conflicts and your suggestions!
Add in Lucille https://github.com/cozydev-pink/lucille/pull/21
I think we can take the regex string value, build a regex and iterate through the terms array collecting matches. This obviously won't be particularly efficient, but it's probably fast enough for now. (And storing an FSA in the index like Lucene does is just too complicated for now)