cosmocode / docsearch

Search through uploaded documents in DokuWiki
http://www.dokuwiki.org/plugin:docsearch
11 stars 11 forks source link

Searching for multiple words fails #3

Closed benzolo closed 14 years ago

benzolo commented 14 years ago

I uploaded the document http://www.easa.europa.eu/ws_prod/g/doc/Agency_Mesures/Certification_Spec/decision_ED_2003_02_RM.pdf to a fresh wiki install and searched for "European" and "Agency" with good results when searching only for one of them. If I search for "European Agency"(without the quotation marks" I do not get any results. Does this plugin not allow this kind of search or is there any other syntax needed to search for more than 2 words?

Andreas

dom-mel commented 14 years ago

The plugin uses dokuwiki functions to build up the index and run a search. So it just allow dokuwiki search querys. see http://www.dokuwiki.org/search.

I tried a similar search on with a dokuwiki page and it seems that it won't work. If you think the dokuwiki search won't work correct, please write a bug report to http://bugs.dokuwiki.org/

benzolo commented 14 years ago

I made another test. I have the pdf document linked above where one headline at the beginning is "European Aviation Safety Agency". I copied this line of text to the playground. Searching for "European Agency" will show correctly the page playground but will not return any result from the pdf. If I search only for "European" I get both, the page and the document result. The same applies if I search for "Agency" alone. I had a look at the generated txt file from pdftotxt and there the line "European Aviation Safety Agency" is correctly extracted. If it is a bug in the dokuwiki search, why does it work when searching in pages? The representation inside the text files should be the same. Of course the index is probably much bigger for the pdf document as for the page because it contains loads more text than the wikipages I have. Could this be a problem?

benzolo commented 14 years ago

I ran another test. I copied the whole text generated from pdftotxt into a wiki page. (I had to increase memory to 64MB and script execution time to 180 seconds for the lexer.php to run through) When I searched for "European Agency" I got the page where I copied the text to as result but not the document. From that point of view it does not look like it is a problem with dokuwiki search because it works when the text is inside a wiki page.

dom-mel commented 14 years ago

I gues i found the bug - it seems like there is a better searching function in dokuwiki - i'll try it tomorrow :)

dom-mel commented 14 years ago

now the search should work. and the search is much quicker ;-)

commit: 5e71f534904caeb344e55282d6e5a6592f4a16fd

benzolo commented 14 years ago

Works like a charm Thank you

dom-mel commented 14 years ago

nice :)