Esri / geoportal-server-harvester

Metadata Harvester for Esri Geoportal Server
http://esri.github.io/geoportal-server/
Apache License 2.0
31 stars 24 forks source link

Harvester not removing content from geoportal that has been removed from source WAF #188

Open MikeRoyer-NOAA opened 1 year ago

MikeRoyer-NOAA commented 1 year ago

Harvester 2.6.4 A harvester task is set up to pull from a WAF and some XML files that have been removed from the source WAF are not being removed from the geoportal. The harvester history for the task reports it acted upon 14537 xml files and the geoportal reports that it has 14852 items in the source of origin (i.e. harvester task). The tasks is not run incrementally. Isn't harvester supposed to remove anything that is not in the source WAF from the geoportal when the task runs?

mhogeweg commented 1 year ago

have you set the Geoportal output broker to 'perform cleanup'? That is what determines if the harvester will attempt to remove existing items from Geoportal.

MikeRoyer-NOAA commented 1 year ago

Yes, the Harvester output broker has the "Perform cleanup" checked. Does it perform cleanup every time a Harvester task is run or on some frequency?

mhogeweg commented 1 year ago

It should do it every time it runs a task. Is your WAF public? I can do some testing on my end

MikeRoyer-NOAA commented 1 year ago

I'm checking with my user base on whether the WAF is public.

In the meantime, can you tell me what the Failed (in/out) column on the history page means. Does "Failed in" mean that the xml is not formed properly and "Failed out" mean that there is some issue with the content within the XML and both situations are not loaded into the output broker?

mhogeweg commented 1 year ago

I added some info on the history page in the wiki: https://github.com/Esri/geoportal-server-harvester/wiki/Tasks#history