bcgov / ckanext-bcgov

BC Data Catalogue source code, main ckan extension
http://catalogue.data.gov.bc.ca
GNU Affero General Public License v3.0
24 stars 23 forks source link

resource_update for larger files results in duplicate rows to datastore resources #102

Closed gjlawran closed 8 years ago

gjlawran commented 8 years ago

Using _resourceupdate of the FileStore API to replace larger files results in duplicate rows in DataStore resources after DataPusher completes update.

See CITZEDC-819

jrods commented 8 years ago

@gjlawran @ll911 @Mbrownshoes I need more information on what the statstrack and the exporter script is doing. What is the process? Which exporter script?

gjlawran commented 8 years ago

This ticket should not be initiated until 1.4.0 is delivered to production.

kfishwick commented 8 years ago

Can we please get an answer now to @jrods question above?

gjlawran commented 8 years ago

The DataStore version of this CSV resource - https://catalogue.data.gov.bc.ca/dataset/bc-data-catalogue-content/resource/4b721abc-46e0-4010-b366-5830c000eb56 - showed double the number of records that the CSV file held. A manual update of the DataStore was initiated 2016/08/15 - and the resource now has 1468 entries - matching the number of records in the Filestore. @Mbrownshoes will confirm this is resolved with other resources .

Mbrownshoes commented 8 years ago

@jrods the exporter script updates the content of the https://catalogue.data.gov.bc.ca/dataset/bc-data-catalogue-content resources. The statstrack job updates uses google analytics data to update the site-usage pages (https://catalogue.data.gov.bc.ca/data/site-usage/dataset). It also runs the index rebuild command. Here's the three paster commands it uses: paster --plugin=ckan tracking update paster --plugin=ckanext-ga-report loadanalytics latest paster --plugin=ckan search-index rebuild_fast

Mbrownshoes commented 8 years ago

This doesn't seem to be an issue any more. Closing.