Closed vitezg closed 2 years ago
Hi @vitezg, you don't need to explicitly define the OPERATOR_CONTACT_URL placeholder in the User Agent Prefix field. It is automatically appended onto the user agent string. E.g.
WCT seems to sets up crawler-beans.cxml properly (first screenshot), however heritrix seems to get a different configuration (second screenshot). I could set the variables as you described, thanks a lot for that, still the issue seems to lie somewhere else.
Hi @vitezg what version are you currently running where you are seeing this issue?
It's 3.0.3, wct-binary-3.0.3.tar.gz as downloaded from https://github.com/WebCuratorTool/webcurator/releases Our heritrix is heritrix-3.4.0-20200518, will try to update it and get back to you.
It's the same with Heritrix 3.4.0-20210621
Hi @vitezg, my guess is that the profile as stored by WCT is not being sent to Heritrix at all at job creation time. Can you check the logging from webcurator-webapp and webcurator-harvest-agent-h3 right after the moment a WCT target instance is run? Maybe that will give us a clue as to what's going on
Hello @hannakoppelaar , thanks a lot for asking for the logs, you have guided me to the solution :) The problem was that WCT was running with a different user id than heritrix, so it could not write to the heritrix job directory. Running both software with heritrix's account solved all the issues, and the first harvest job just finished!
That's great news @vitezg! :)
We've set up WCT and set the operator contact URL in the profile, however this data does not seem to propagate to the heritrix job configuration. I've attached four screenshots. Any idea what the problem can be?