brawer / wikidata-qrank

Ranking signals for Wikidata
https://qrank.wmcloud.org
MIT License
66 stars 5 forks source link

page_entities should be cached across pipeline runs #33

Closed brawer closed 6 months ago

brawer commented 6 months ago

The page_entities files should get cached across pipeline runs. Currently, it seems they’re always getting rebuilt from scratch, even if they’re already present in S3 storage. The uploading to storage seems to be working perfectly fine. However, it seems we don’t do the right checks to see whether a cached version is already present.