Open glenacota opened 1 year ago
it seems the issue is connected to the reuse of the colly instance (https://github.com/co0p/x-scrap/blob/master/cmd/xscrap/main.go#L15).
In fact, by resetting the c.found
field in https://github.com/co0p/x-scrap/blob/master/infra/scraper/colly.go#L24, there is no count carry over between subsequent urls. Another problem remain, though: the html content of the 2nd url is fetched twice, doubling the number of found tags; the html content of the 3rd url if fetched three times; and so on...
By re-initialisating completely the Colly.collector
field for every url, the html content of every url is fetched only once.
single url#1
single url#2
single url#3
but url#1+url#2+url#3