Ellerhold / fs2es-indexer

This tool indexes your directories into an elastic search index and prepares them for searching via Mac OS Spotlight search in a samba file server.
44 stars 3 forks source link

Which type of queries does this indexer enable? #5

Closed lifepillar closed 2 years ago

lifepillar commented 2 years ago

Thanks for this nifty tool. I was able to index a small bunch of files, but I am a bit confused about the results, or lack thereof, of some of my searches. Would it be possible to add some documentation about that?

I have run fs2es-indexer index, and it successfully indexed all the items in my share (the count of the items corresponds to the actual number of files and directories).

Now, there are six PDF files in my share. Say I want to search for all of them. Then, mdfind -onlyin /Volumes/MyShare pdf returns four of them; mdfind -onlyin /Volumes/Battista kind:pdf returns none. In the Finder, pdf returns nothing, no matter whether I just type pdf, search by name or search by kind. In general, searching for files with a given suffix (e.g., txt or .txt) does not return any result, neither in the Finder nor with mdfind.

Searching is wobbly in other ways. If a directory's name is xyz, searching for xyz returns all the files in that directory (there was one case in my tests in which a search returned just the directory, but I had to search for it by name). Is that intentional and/or configurable? I find such behaviour extremely confusing.

There is a file named different-etc. I start typing in the Finder: d (the file appears along with others), di (now it is the only match), dif (the match is still there), diff (the match disappears!?), diffe (the match reappears!). I can reproduce this consistently.

Search by name in the Finder is case sensitive. Is that configurable?

Also, it seems that matches are found at “word boundaries”. For instance, if a file is named “abc-def_gh-123Hello”, I can find it by typing abc, de or 12, but not if I search for gh or Hello. Is this the way Spotlight works, or can it be configured server-side?

It would be nice if there were some documentation/pointers about how queries work, what one should expect from the different ways of searching (just typing, by name, by kind, etc.), and possibly some tips on how to configure fs2es-indexer and/or Elasticsearch (if relevant) to get the best “macOS” experience.

MatthiasKuehneEllerhold commented 2 years ago

A lot of in-depth discussions happened on https://github.com/dadoonet/fscrawler/issues/1164 that prompted me to write this tool for ourselves.

You can enable query logging on your es instance and see the queries for yourself. To debug them properly install kibana and recreate the searches there. Kibana itself indexes strings weirdly sometimes. As you described we have noticed some partial matches to be found and some not. (This was on an ELK stack that parses error log files). So this may be an ES bug / fault and not a problem of the indexer.

MatthiasKuehneEllerhold commented 2 years ago

Im gonna close this issue, if you feel anything is missing please comment and/or reopen it.

MatthiasKuehneEllerhold commented 2 years ago

In 0.3.2 I've added a "search" command so you can test the search locally.

Looking at the source of samba it appears that you're correct. It'll only find files if the search term matches the start of a word boundary.

As far as Im aware, this is not changeable in samba. I dont know if the samba team would be open to let the user configure this. That shouldnt be too bad to implement (I think? IDK).

lifepillar commented 2 years ago

Thanks for looking into it! Good to know where that behaviour come from.