Closed andresmrm closed 4 days ago
Just for context, you can use wildcard and regex queries in Aleph using the ElasticSearch query string syntax.
As you already noticed, both wilcard and regex queries are computationally expensive at search time which makes them slow. While there are options to speed up such queries, these require indexing contents differently (e.g. using ngrams) which usually comes at a significantly higher cost for ingesting and storing the data. This makes it a difficult trade-off.
Yes, I understand it's hard to make it faster... =/
I knew about the "abc?" query, but not the "abc*". Maybe it should be added to the docs? https://docs.aleph.occrp.org/users/search/advanced/
Regex search "abc.*" doesn't seem to work for me from Aleph search page. Only when accessing ES directly.
Edit: Ops, I see now why. It should be "/abc.*/". Sorry for the confusion.
Hi @andresmrm, sorry for the late reply. Thanks for your suggestion, I have added a section to the docs that links to the full ES query syntax reference.
Is your feature request related to a problem? Please describe. Sometimes the searched term appears without space separation to another word (like
nº123
, instead ofnº 123
, so the query doesn't find anything if I just use123
, I need to search fornº123
).Describe the solution you'd like I would like to search for
123
and findnº123
.Describe alternatives you've considered Sometimes using
??123
can help, but not if the number of chars vary.As discussed in Slack, I've managed to make queries directly to ElasticSearch to use regex queries. But they were too slow (~3s each) and I needed to query a huge list of terms. So I ended up doing regular queries for the most common patterns (~30ms each). For example, in my case the terms generally appear like
0123456789
or012.345.678-9
, so I queried each version of the term for each term (2x30ms=60ms << 3s). But I gave up less common cases, likenº123
.It maybe good to allow regex queries, even if slow, for when you just need to search for a few terms. And, if possible, make regex faster or offer another type of partial match.