GSA / data.gov

Main repository for the data.gov service
https://data.gov
Other
557 stars 88 forks source link

Metadata for specific dataset not always updated #1542

Open hkdctol opened 4 years ago

hkdctol commented 4 years ago

Dataset in catalog does not show updated metadata--harvest objects are still linked to old ones, not updated. and required Solr reindex

How to reproduce

  1. Came up most recently with:

https://catalog.data.gov/dataset/real-estate-across-the-united-states-rexus-inventory-building https://catalog.data.gov/dataset/real-estate-across-the-united-states-rexus-lease

This dataset had been updated with an April update

Expected behavior

When you click on "download metadata" on the catalog page of either dataset, that metadata should show an April version and metadata

Actual behavior

The "download metadata" showed March version and metadata, preventing certain users from using scripts to automatically access correct version. (See screenshot)

This particular issue was resolved with Solr reindex by @FuhuXia but creating issue to track known bug

REXUS screenshot
adborden commented 4 years ago

@FuhuXia @jbrown-xentity could this be related? https://github.com/GSA/datagov-deploy/issues/526

FuhuXia commented 4 years ago

@adborden They have similar behavior on the UI. But the fact that fixes are different makes me believe the root causes are different. Datasets in this issue can be resolved by a solr re-indexing, in other issues when the link to metadata returns 404, the fix has to be done in DB side first then a re-indexing.

adborden commented 4 years ago

Ah, good point. Do you have a sense of why solr is out of sync? At what point in the harvest process does solr get updated?

FuhuXia commented 4 years ago

I think solr index is handled by ckan core when package is updated. Individual ckan extension does not handle it. Not exactly sure why solr is out of sync in this issue. We can spend some time to see whether it is replicate-able in local environment, if so, then it is code related. But I would rather believe it is possible due to some solr glitches. Some solr log with same timestamp might be able to tell us something.

adborden commented 4 years ago

Do you know if there have been improvements here upstream, like in CKAN 2.8 that might address some of the solr issues?

FuhuXia commented 4 years ago

I am not aware of any similar issue being raised elsewhere, or any improvement done on this kind of issue in 2.8.