Closed popthink closed 8 years ago
Either reduce the number of threads or increase the memory allocated to the java virtual machine by adding the -Xmx
flag to in the collector-http.sh (or .bat) file. Like this (put your own value):
java -Xmx2048m ...
I do not know about the code you wrote, but is your code running crawlers periodically a Java program that always runs? If so, it is recommended you launch each collector instances as external processes. Making sure each time the collector runs it is own JVM instance ensures memory is cleared each time. Using the OS native way to schedule runs instead is a good way to ensure this (e.g., cronjobs or Windows Task scheduler).
I tested it for 7 days.
And.. Seems that found a solution. :)
--Env-- Crawler Thread Count : 25 Instance * 5 Thread Delay : 100ms OS : Debian 7 Java : java7, -Xmx10G
Swiping Code after 1 cycle finished:
MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
mbs.unregisterMBean(new ObjectName("com.norconex.collector.http.crawler:type=" +idOfCrawlerInstance));
Unregistered crawler descriptor from MBeanServer for gc(maybe?).
Then now it doesn't occur OOM.
I'm not sure it is a proper solution but it works on my case.
Thank you :)
Thanks for providing this valuable feedback. Since JMX support does not benefit the vast majority of users, it is now disabled by default in the latest snapshot release. It can be enabled by adding the JVM argument -DenableJMX=true
. Maybe this will bring slight performance improvement too.
I wrote code that runs crawlers periodically.
The code runs new 50 Crawler Threads(50 Web Site)on every hour.
But It occurs this error when It had run for 3~4 days.
Thank you.