GSA / data.gov

Main repository for the data.gov service
https://data.gov
Other
634 stars 100 forks source link

Dataset deleted from harvest source does not get deleted on catalog #3239

Open FuhuXia opened 3 years ago

FuhuXia commented 3 years ago

User reports dataset stays on catalog even it was absent on harvest source.

How to reproduce

https://admin-catalog-next.data.gov/dataset/south-carolina-marine-bird-density

Expected behavior

404

Actual behavior

not 404 on admin site.

It was manually deleted upon user request. So on the public site, https://catalog.data.gov/dataset/south-carolina-marine-bird-density is 404 now.

Sketch

[Notes or a checklist reflecting our understanding of the selected approach]

FuhuXia commented 3 years ago

Even though issue #2981 found specific (so far) to data.json harvest source can create same sticky datasets, on the backend it is different issue. This dataset has a HO link in the DB, even it does not show on the UI. In the harvest_object table it shows the dataset was harvested from a harvest source with a weekly schedule, three months later it gets an error during anther harvest job, then it stays that way for a few months until it was manually deleted today.

-[ RECORD 1 ]-----+-------------------------------------
id                | cb2166d8-1a45-418f-b924-66f64fb78a57
guid              | gov.noaa.nmfs.inport:56825
gathered          | 2021-02-02 17:23:36.074109
fetch_finished    |
state             | ERROR
harvest_job_id    | 6687b6a6-7d5b-46f8-8029-380a4c170652
harvest_source_id | c0121fd9-df15-4168-ac04-42f6e36a794d
-[ RECORD 2 ]-----+-------------------------------------
id                | f34cdba2-3fb6-4a04-8270-4f908bf1c162
guid              | gov.noaa.nmfs.inport:56825
gathered          | 2020-11-11 17:17:06.989651
fetch_finished    | 2020-11-12 10:25:38.551609
state             | COMPLETE
harvest_job_id    | f1ab16e6-6a18-4159-b38c-0442ef8793e3
harvest_source_id | c0121fd9-df15-4168-ac04-42f6e36a794d
FuhuXia commented 3 years ago

Dataset south-carolina-marine-bird-density is not a good demo any more since it was manually deleted, but I think we can find other dataset in the same harvest source that can be used to demo the issue.