ckan / ckanext-harvest

Remote harvesting extension for CKAN
130 stars 203 forks source link

Huge harvester hangs #549

Open ziorick opened 2 months ago

ziorick commented 2 months ago

Hi to all! I have installed ckan 2.10.3. I'm trying to harvest (using ckan-harvester plugin) a huge other ckan portal (data.gov) about 296k datasets. I don't need to import "remote_orgs" and my configuration is only with "clear_tags" as true. The gather process start successfully, and ask to remote the correct api/path and row num... All works well. After the first read stage, the gather process start to log: Creating HarvestObject for ... foreach dataset. But never write the line: xxxxxx datasets sent to fetch queue or similar, as in other harvest processes. This instance run on 32GB DDR4, 40c/40t Xeon CPU. The result is a ckan process that use about 10% CPU, 35% RAM resource and ythe postgres grow up (the harvest_object table) but no fetch is started. Can you help me?