Closed Church- closed 4 years ago
Hi, @Church- ! Use getSimpleHTMLDomCached
@em92 Hey there. :smile:
So how would getSimpleHTMLDomCached help exactly? Thought that just cached the result of a page for re-use when scraping again later?
When I'm having trouble even generating the initial feed without timing out with a 504.
Even if I up the execution time with ini_set('max_execution_time', '3000');
in the bridge it's still liable to time out or have my rss-reader time out trying to get a feed.
Closing this out, found a work around using the feediron plugin in tt-rss
Using this config in feediron:
"archiveofourown.org": {
type": "xpath",
"xpath": [
"div[@class='userstuff module']"
]
}
Works wonderfully when scraping chapters of works.
Describe the bug So I've made a modification to the AO3 bridge to grab the full text of each chapter of a work so I can do reading fully in my RSS reader. I've noticed that compared to a pared down repro script via simpleHTMLDom that could take around 20s to generate a list of chapters, my modification which does effectively the same thing can take anywhere up to 3-5min to grab the text and generate it.
So I'm curious if there's any obvious ways to speed this up?