frictionlessdata / data-quality-dashboard

Data Quality Dashboards display statistics on a collection of published data.
Other
33 stars 10 forks source link

Using github+rawgit for data repo is not optimal for deployment #54

Closed trickvi closed 8 years ago

trickvi commented 8 years ago

Spend publishing dashboard is configured to read in data files from a git repo via rawgit. In order to be fast it uses the CDN capability of rawgit. However grabbing via rawgit's CDN means the files are cached permanently so we have to pinpoint the exact commit:

It's best to use a specific tag or commit hash in the URL (not a branch). Files are cached permanently after the first request.

This obviously raises problems when scraping. You can't scrape and push to the repo and all will work. The site will still run off the old data. After each scrape and commit to the data repo, the commit must be extracted, the config file in the code updated and the site redeployed. It's either that or use the non CDN version in the hopes that our site will never be popular:

Excessive traffic may lead to throttling and blacklisting.

pwalsh commented 8 years ago

We just need to tag our data releases if we stick with rawgit as a backend. Probably, though, we'll stay on their non-CDN version and cache the data with the server, as a better (faster) alternative.

pwalsh commented 8 years ago

Flow has changed. WONTFIX.