artsy / fusion-deprecated

Experimental orchestration layer for Artsy web clients (deprecated)
MIT License
1 stars 2 forks source link

Scraper is not revisiting old artworks? #10

Closed dblock closed 7 years ago

dblock commented 7 years ago

I made a change in artwork image URLs (long explanation in https://github.com/artsy/barium-ion), then eventually ran heroku run npm run scrape --app=fusion-production. It quietly quit on the 237 days marker without any errors. I manually ran heroku run npm run scrape --app=fusion-production 237 to continue but I suspect that we never automatically revisit very old artworks, causing sitemaps to have a number of unpublished artworks as I have been noticing.

Most updates are reflected via updated_at. I would replace the scraper to key off that to begin with.

dblock commented 7 years ago

This is fixed in cinder sitemaps.