Closed volvofixthis closed 7 years ago
Thanks for the details, seems it's blocked somewhere in scheduler, will investigate that.
can i help you somehow? i am very excited with splash and other fixes in new version, but can't move on because of this. i can give you even root access to all setup.
I don't have any idea in mind currently, need to review the changes of scheduler first.
Maybe this will help you, old docker image is laki9/pyspider:python3 , new laki9/pyspider:git_python3 Old image was created six month ago.
checked which last commit was in old version here it is https://github.com/binux/pyspider/commit/0742654a7f9fd4606e946a2c4717f733b2707dd3
How do you think it will be hard to backport commits about phantomjs timeout and debug with lazy config to this version? https://github.com/binux/pyspider/commit/360f8698b59f68455252847ef318d8685dcf1146
lazy config is not easy to backport, it related to deep changes to multiple components
I found a potential bug that may block scheduler when one of the projects on_finished
triggered when newtask_queue is full. Which can explain your case, I will fix that, but can't tell if it's the true issue here, you can try later.
This can explain why there no any problems with low number of projects.
nice job mate, it is working :+1: i will monitor it a little and close bug if all is ok.
I mensioned earlier that i have some sort of problem with pyspider getting stalled at some moment. Two versions of pyspider, one from june i think and one is fresh from git. 0.3.8 git https://puu.sh/tiRnt/c5b3b4825c.png 0.3.9 git https://puu.sh/tiRpf/800c11e61c.png another moment of totally stop after restart: https://puu.sh/tjO62/73d1c639fe.png
Logs from fresh version are totally without any errors, but logs are just stop writing anything at some point of time. scheduler:
fetcher:
processor:
result worker:
configuration:
typical project:
Maybe you have idea why same projects works ok under old version but not with new one? Number of projects are around 500. I tried fresh install with few projects it working ok. How i can debug this situation? Is there way to really check out what fill my queue?
Maybe this can add some clearance. I notice that process is not instant, i see slow down and at some point all stop working at all.
For testing purpose tried increase number of result, processor, scheduler and fetcher when pyspider stoped doing something. It alive again, but for how long...
Strange, with increased containers all working ok...