cosmocode / docsearch

Search through uploaded documents in DokuWiki
http://www.dokuwiki.org/plugin:docsearch
11 stars 11 forks source link

Memory problem #2

Closed benzolo closed 14 years ago

benzolo commented 14 years ago

I tested the plugin successfully on fresh new wiki with success. After that I gave it a try at our main wiki using only the pdf converter given in the example config. After manually calling cron.php it crashes after a second and says:

Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 8 bytes) in /var/www//inc/indexer.php on line 224

It did completely convert one pdf to txt which was stored in the data section. It has a size of 1597616 bytes and it contains 221296 words in 9792 text lines. Currently I can not tell whether it crasher while indexing on this file or while trying to convert the next one. Does the converter itself gets affected as well by the php memory limit when it gets calles from inside a php script?

Andreas

dom-mel commented 14 years ago

I gues it caused by the indeing process - so a greater php memory limit could fix it.

Does the converter itself gets affected as well by the php memory limit. No

benzolo commented 14 years ago

I increased the memory limit to 64MB and it worked fine. However it took 22 minutes to index 500 pdf documents. I encountered a lot of errors generated by pdftotxt. I randomly checked that it generated a txt file though but probably not fully complete. It might be worth to dump the output(if there is any) of each conversion to a logfile including which file caused the error that an admin knows when there is something going on. It would probably easy to implement to check if the system call returns anything then put this to a file with the the system call itself. After having 500 documents indexed a search took sometimes about 10 to 20 seconds. But thats probably a problem with the search implementation of dokuwiki in such a huge file based database.

dom-mel commented 14 years ago

with the patch 5e71f534904caeb344e55282d6e5a6592f4a16fd the search should be much faster :)