WebCuratorTool / webcurator

The root of the webcurator tool project, containing all modules needed to run a fully functional webcurator tool.
Apache License 2.0
2 stars 1 forks source link

Better handling of invalid H3 profiles needed #26

Open obrienben opened 3 years ago

obrienben commented 3 years ago

When an H3 profile contains valid text but invalid logic (i.e. a bad regex filter) it is possible for a running Target Instance to show no harvest data and not appear to have started.

image

On the Harvest Agent, the following errors can be seen:

wct-agent-h3.log

2020-11-19 19:25:25.641 +1300 INFO  [http-nio-8080-exec-9] o.w.c.h.a.HarvesterH3 (HarvesterH3.java:161) - Getting Heritrix3Wrapper using hostname=localhost, port=8443, keyStoreFile=null, userName=admin
2020-11-19 19:25:25.647 +1300 DEBUG [http-nio-8080-exec-9] o.w.c.h.a.HarvesterH3 (HarvesterH3.java:126) - Created new harvester 128084481
2020-11-19 19:25:26.805 +1300 INFO  [http-nio-8080-exec-9] o.w.c.h.a.HarvesterH3 (HarvesterH3.java:595) - Launched harvester=128084481, jobStatus shortName=128084481, statusDescription=Unbuilt
2020-11-19 19:25:26.821 +1300 INFO  [http-nio-8080-exec-9] o.w.c.h.a.HarvesterH3 (HarvesterH3.java:608) - Building H3 job=128084481.....
2020-11-19 19:25:27.050 +1300 DEBUG [http-nio-8080-exec-9] o.w.c.h.a.HarvesterH3 (HarvesterH3.java:435) - Getting the harvest root directory for 128084481
2020-11-19 19:25:27.051 +1300 ERROR [http-nio-8080-exec-9] o.w.c.h.a.HarvesterH3 (HarvesterH3.java:637) - Failed to start harvester 128084481: Could not launch H3 job.
2020-11-19 19:25:27.103 +1300 ERROR [http-nio-8080-exec-9] o.w.c.h.a.HarvesterH3 (HarvesterH3.java:694) - Failed to start harvester 128084481: Failed to start harvester 128084481: Could not launch H3 job.
org.webcurator.core.harvester.agent.exception.HarvesterException: Failed to start harvester 128084481: Could not launch H3 job.
        at org.webcurator.core.harvester.agent.HarvesterH3.start(HarvesterH3.java:638)
        at org.webcurator.core.harvester.agent.HarvestAgentH3.initiateHarvest(HarvestAgentH3.java:147)
        at org.webcurator.core.harvester.agent.HarvestAgentH3Controller.initiateHarvest(HarvestAgentH3Controller.java:33)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:190)
        at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:138)
        at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:104)
        at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:892)
        at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:797)
        at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)
        at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1039)

heritrix_out.log

2020-11-19 06:25:26.421 INFO thread-38645 org.archive.crawler.framework.Engine.addJobDirectory() added crawl job: 128084481
2020-11-19 06:25:26.422 WARNING thread-38645 org.archive.crawler.framework.Engine.findJobConfigs() invalid job directory: /mnt/wct-
harvester/kaiwae-z4/heritrix3/dia_test where job expected from: /mnt/wct-harvester/kaiwae-z4/heritrix3/dia_test
2020-11-19 06:25:27.020 SEVERE thread-38645 org.archive.crawler.framework.CrawlJob.beansException() Failed to convert property valu
e of type 'java.util.ArrayList' to required type 'java.util.List' for property 'regexList'; nested exception is java.util.regex.Pat
ternSyntaxException: Dangling meta character '*' near index 0 */Katoa.se.* ^; Can't create bean 'org.archive.modules.deciderules.Ma
tchesListRegexDecideRule#5b09067d'; Can't create bean 'scope'; ; Can't create bean 'frontier'; ; Can't create bean 'seeds'
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'seeds': Injection of autowired dependencies
 failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire method: public void org.ar
chive.modules.seeds.SeedModule.setSeedListeners(java.util.Set); nested exception is org.springframework.beans.factory.BeanCreationE
xception: Error creating bean with name 'frontier': Injection of autowired dependencies failed; nested exception is org.springframe
work.beans.factory.BeanCreationException: Could not autowire method: public void org.archive.crawler.frontier.AbstractFrontier.setS
cope(org.archive.modules.deciderules.DecideRule); nested exception is org.springframework.beans.factory.BeanCreationException: Erro
r creating bean with name 'scope' defined in URL [file:/mnt/wct-harvester/kaiwae-z4/heritrix3/128084481/crawler-beans.cxml]: Cannot
 create inner bean 'org.archive.modules.deciderules.MatchesListRegexDecideRule#5b09067d' of type [org.archive.modules.deciderules.M
atchesListRegexDecideRule] while setting bean property 'rules' with key [5]; nested exception is org.springframework.beans.factory.
BeanCreationException: Error creating bean with name 'org.archive.modules.deciderules.MatchesListRegexDecideRule#5b09067d' defined
in URL [file:/mnt/wct-harvester/kaiwae-z4/heritrix3/128084481/crawler-beans.cxml]: Initialization of bean failed; nested exception
is org.springframework.beans.TypeMismatchException: Failed to convert property value of type 'java.util.ArrayList' to required type
 'java.util.List' for property 'regexList'; nested exception is java.util.regex.PatternSyntaxException: Dangling meta character '*'
 near index 0
*/Katoa.se.*
^
        at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor.postProcessPropertyValues(AutowiredAnn
otationBeanPostProcessor.java:285)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFac
tory.java:1074)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFac
tory.java:517)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFacto
ry.java:456)
        at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:291)