Norconex / jef

Job Execution Framework.
Apache License 2.0
4 stars 5 forks source link

feature request - add pid to collected information #5

Open danizen opened 7 years ago

danizen commented 7 years ago

I'm adding to my crawler, the skeleton of which still appears in https://github.com/danizen/trynorconex, a collector life-cycle listener to recall the pid of the process into a file. The filesystem filled up, not due to any fault in the crawler, but due to a mismatch between Netapp NFS snapshot policies and things like that.

I think this would make a great addition to jef and jef-monitor, because there is no way (that I can see), for JEF monitor to positively determine whether the process is still running.

I like that JEF relies only on local system resources - storage only at this point. Some corporate IT go nuts when you tell them you need Zookeeper, Redis, an AMQP compliant message queue, MongoDB, etc. So, it is good to be able to say - start this process here and it will write progress to the filesystem. Often, sharing the filesystem is easier than opening ports.

Writing the pid, and then having jef-monitor optionally check the pid status (kill -0 or some more direct way to do this from Java), would be a great addition.

NOTE - This is an idea only, and minor request - I'm committed now to custom dashboards for progress monitoring, because my post-process is written in Python. If I end-up writing a Python client for JEF API, I'll let you know