use ets instead of registry for store

happysalada commented 4 years ago

Here is my reasoning.

using the store as a registry, means the process mailbox can become full and lead to a bottleneck.
The advantage of having a registry here is that it is distributed. Deploying the crawler in a distributed fashion raises additional questions, so I have opted out of that path. (instead of ets it would be possible to use mnesia, which is slower but distributed)
I added a store.reset method for periodic crawling tasks. (i.e. I want to check the content of the same pages everyday)
running some tests on my local, the next bottleneck comes from hackney https://github.com/edgurgel/httpoison/issues/359

My original issue was running into this actually.

In order to increase performance, the connection pool has to be managed in a different way. I'm currently researching this.

Let me know what you think

coveralls commented 4 years ago

Coverage decreased (-0.5%) to 98.883% when pulling d3ed3f2647806a25454ffa0d72f894d01af85cd4 on happysalada:master into f2e0e93e7385e7acee657cd02d934c1edb595c52 on fredwu:master.

happysalada commented 4 years ago

I'm going to go ahead and close this PR, as it doesn't fix the underlying issue. Let me know in case you still want to merge it

fredwu / crawler

use ets instead of registry for store #29