featdd / dpn_glossary

Glossary extension for TYPO3
http://typo3.org/extensions/repository/view/dpn_glossary
GNU General Public License v2.0
20 stars 30 forks source link

Timeout error when parsing is activated #219

Closed wmarie-joseph closed 3 months ago

wmarie-joseph commented 3 months ago

TYPO3 12.4.16 PHP 8.3 MYSQL 8.0 dpn_glossary 5.3.0

Hello,

My website has about 912 terms and 10 html tags but when the parser is activated i get a timeout error on all my pages (except those that are exempt from parsing of course), if i don't deactivate the parser i cannot access my website pages. Furthermore i have no error logs whatsoever... Is there a configuration or something extra that has be done to avoid this issue ?

Thanks for your feedback

featdd commented 3 months ago

Hi @wmarie-joseph,

uncached the parser takes some time to parse the page, but I've never heard of a timeout yet. But I never had so many terms before, I need some time to build the test case and reproduce this.

Can you say how big the HTML is and how many levels of nesting there are?

Greetings Daniel

wmarie-joseph commented 3 months ago

Hello @featdd

What do you mean by how big is the HTML ? And the website has about 90 active pages in total. If you want i can give you an export of the 900 terms so you can easily import it in your TYPO3 instance.

featdd commented 3 months ago

Hi @wmarie-joseph,

an export of this many terms would help me, otherwise I would have to write a script or something to generate this many terms 😅

With the size I meant how many lines, characters or kb of html do the pages have, the parser iterates each term searching through the DOM from the HTML, if the HTML is megabytes of size and has dozens of levels of nested elements this maybe the problem, but I'm just grasping at straws here.

There is also the Option settings.limitParsingId that lets you define the area where the parser only should search for terms, like the body or a div with the main contents which you could try to see if it helpts.

Greetings Daniel

wmarie-joseph commented 3 months ago

Hi @featdd ,

Here is the export of all my terms : tx_dpnglossary_domain_model_term_010724-0824.csv

I tried the settings.limitParsingId only limiting the parsing to the main content div and it doesn't help, i still get a timeout error

featdd commented 3 months ago

Hi @wmarie-joseph,

can you send me an SQL dump of your glossary tables? (I don't know how to import this format without writing a script)

Greetings Daniel

wmarie-joseph commented 3 months ago

Hi @featdd ,

here is the SQL dump of my dpn_glossary tables, hope this helps :

dpn_glossary_tables.sql.zip

wmarie-joseph commented 3 months ago

I want to also point out that my website has 2 languages so some words will appear twice (in french and english)

featdd commented 3 months ago

Hi @wmarie-joseph,

I just figured out the issue, it wasn't about performance but your term "P02" containing two synonyms which both are empty. This should'nt happen because the normally have a validation for not being empty, but maybe there was a backend glitch or somehting!?

The empty synonyms created an endless loop due to the parser going crazy looking for empty contents. Nevertheless I just pushed an update 5.3.1, that checks if a synonym is empty to prevent this.

But you should also want to delete the empty synonyms in your data.

Greetings Daniel

wmarie-joseph commented 3 months ago

Hi @featdd ,

Great ! Thank you so much