dungtruongtien / kewe-crawler

0 stars 0 forks source link

Insufficent way to process the uploaded keyword list #21

Open longnd opened 7 months ago

longnd commented 7 months ago

Issue

The right decision was made to process the uploaded keyword list asynchronously by enqueuing each keyword to a queue. However, the keyword list is transformed to the messages to push to the queue to let the subscribe (implemented in the crawler service) to pick up and handle. It has some limitations

A better approach should be

by doing so, the user can see the entire keyword list on the dashboard after they are uploaded and know which ones are being processed or completed. There is also no risk of losing the unprocessed keywords.

Also, why not use Axios to send HTTP requests to Google and parse the responses, it should be much faster than Puppeteer?

dungtruongtien commented 7 months ago

Thanks for your advice. I've implemented a solution to let user tracks their crawling progress. But it still has some limitations as you mentioned in https://github.com/dungtruongtien/kewe-crawler/issues/22.

In the feature, this should be synced the process from the cache to persistent storage to let the user still track when refreshes the page.

Also, why not use Axios to send HTTP requests to Google and parse the responses, it should be much faster than Puppeteer? As mentioned in https://github.com/dungtruongtien/kewe-crawler/issues/20, I chose Puppeteer because of the statistic information. But I think I'll research more about the solution with HTTP requests.

longnd commented 7 months ago

I relied to all of your comments on the issue #20.