CASM-Consulting / springcrawler

Apache License 2.0
0 stars 0 forks source link

Decouple Scheduler and implement handling functions #12

Closed Punchwes closed 4 years ago

Punchwes commented 4 years ago
  1. Created 7 new files (Job, JQMJob, JobProvider, JQMJobProvider, JobRunner, JQMJobRunner, SchedulerRunner): decouple previous scheduler file, and implemented some handling functions; JQMJob - The job object that is used cross the Scheduling thing; Has three attributes (JobInstance, JobRequest and Source) JQMJobRunner - 1) get registered jobInstance from the JQM client; 2) run new Job JQMJobProvider - 1) obtain JQMJob from sources; 2) convert source to JQMJob object; SchedulerRunner - manage the Scheduling pipeline;

  2. Added new Scheduler related EVENT classes; (Event.java)

  3. In Util.java, add CRAWL_SCHEDULE to source when creating fake-net; (Util.java)

  4. update pom.xml to include spring-boot shell package;

  5. Some questions: 1)Currently, if we are going to rerun an existing job, we take its corresponding source, create a job request and run it. However, by doing this, we might not be able to submit the same parameters as the previous jobInstance have. Not sure about the desired mechanism, if we want exactly same parameters, we could easily get them from jobInstance and assign them to corresponding new-created jobRequest as well. 2)For all fake-net testing jobs, the CRAWL_ACTIVE always returns false thus globalActiveSources is empty. During my testing, I delete this bit and test on all sources. For this pull request I add it back for it is designed to do so.