using the store as a registry, means the process mailbox can become full and lead to a bottleneck.
The advantage of having a registry here is that it is distributed. Deploying the crawler in a distributed fashion raises additional questions, so I have opted out of that path. (instead of ets it would be possible to use mnesia, which is slower but distributed)
I added a store.reset method for periodic crawling tasks. (i.e. I want to check the content of the same pages everyday)
Coverage decreased (-0.5%) to 98.883% when pulling d3ed3f2647806a25454ffa0d72f894d01af85cd4 on happysalada:master into f2e0e93e7385e7acee657cd02d934c1edb595c52 on fredwu:master.
Here is my reasoning.
My original issue was running into this actually.
In order to increase performance, the connection pool has to be managed in a different way. I'm currently researching this.
Let me know what you think