gocolly / colly

Elegant Scraper and Crawler Framework for Golang
https://go-colly.org/
Apache License 2.0
23.2k stars 1.76k forks source link

robotsMap memory continues to grow #815

Open ZigHuang opened 5 months ago

ZigHuang commented 5 months ago

Hello, I am using colly to visit some websites and set c.IgnoreRobotsTxt = false.

As it runs, you will observe that the memory continues to grow over a relatively long period of time.

This growth rate is difficult to observe by using pprof.

As a control experiment, I set up a set of colly with the same configuration, the only difference was setting c.IgnoreRobotsTxt = true.

After running for a period of time, the memory of the latter continues to be stable within 1G, but the memory of the former continues to increase.

截屏2024-05-13 16 24 46

I can't find any other way to reset this robotsMap if c.IgnoreRobotsTxt = false is set other than reinitializing via colly.NewCollector()

ZigHuang commented 5 months ago

I can raise a PR to set the size or other limits of robotsMap to avoid continuous increase in memory.