[ ] Question. This issue tracker is not the best place for questions. If you want to ask how to do
something, or to understand why something isn't working the way you expect it to, use StackOverflow
instead with the label 'stormcrawler': https://stackoverflow.com/questions/tagged/stormcrawler
[x] Bug report. If you’ve found a bug, please include a test if you can, it makes it a lot easier to fix things. Use the label 'bug' on the issue.
[ ] Feature request. Please use the label 'wish' on the issue.
Reproduce steps
To reproduce it, we can run the HttpProtocol main function with many urls with MultiProxyFactory
the crawler.conf
config:
http.agent.name: test
http.proxy.manager: org.apache.stormcrawler.proxy.MultiProxyManager
http.proxy.file: proxies
http.robots.file.skip: true
What kind of issue is this?
[ ] Question. This issue tracker is not the best place for questions. If you want to ask how to do something, or to understand why something isn't working the way you expect it to, use StackOverflow instead with the label 'stormcrawler': https://stackoverflow.com/questions/tagged/stormcrawler
[x] Bug report. If you’ve found a bug, please include a test if you can, it makes it a lot easier to fix things. Use the label 'bug' on the issue.
[ ] Feature request. Please use the label 'wish' on the issue.
Reproduce steps
To reproduce it, we can run the HttpProtocol main function with many urls with MultiProxyFactory
the crawler.conf
the proxies file
Root cause
The HttpProtocol (both okhttp and apache) is not thread-safe
Example 1 (wrong proxy auth)
Example 2 (wrong proxy used)