climate-mirror / datasets

For tracking data mirroring progress
201 stars 18 forks source link

Billion-Dollar Weather and Climate Disasters #14

Open nickrsan opened 7 years ago

nickrsan commented 7 years ago

Name: Billion-Dollar Weather and Climate Disasters Organization: NOAA Description URL: https://www.ncdc.noaa.gov/billions/ Download URL: File Types: Size: Status:

detrout commented 7 years ago

I tried to archive this site using brozzler which didn't work (javascript widgets didn't render), and then wget --mirror followed up testing on a VM which did render with default filters settings..

However I discovered at least three URLs that are backed by server side filtering code.

https://www.ncdc.noaa.gov/billions/annual-summary/1985-2000?disasters[]=drought&disasters[]=flooding&disasters[]=freeze&disasters[]=severe-storm&disasters[]=tropical-cyclone&disasters[]=wildfire&disasters[]=winter-storm&begYear=1985&endYear=2000&cpi=false

https://www.ncdc.noaa.gov/billions/state-freq-geochart-data-1980-1990.json?disasters[]=drought&disasters[]=flooding&disasters[]=freeze&disasters[]=severe-storm&disasters[]=tropical-cyclone&disasters[]=wildfire&disasters[]=winter-storm&begYear=1980&endYear=1990&cpi=true

https://www.ncdc.noaa.gov/billions/disaster-mugl.xml?disasters[]=drought&disasters[]=flooding&disasters[]=freeze&disasters[]=severe-storm&disasters[]=tropical-cyclone&disasters[]=wildfire&disasters[]=winter-storm&cpi=true&cost=true&state=US&running-mean=true

I could probably implement a simple django site that fairly faithfully reproduces the site, but would it be better to go look for something else to archive?

bkirkbri commented 7 years ago

I could probably implement a simple django site that fairly faithfully reproduces the site, but would it be better to go look for something else to archive?

Is there a bulk/raw data download available anywhere on the www.ncdc.noaa.gov/billions site? If so, that would be great to have. If not, it's probably best to move on to the next issue. These sort of dynamic sites are too difficult to archive in the short time that we have. Thanks!

detrout commented 7 years ago

Yes, there were some raw data links, I grabbed them I extracted my mirror into an apache document root and browsed it. with default settings the pages load the same as the official site, but obviously the filters don't work because its just a static mirror.

I wrote a quick README describing the extra steps I did above and beyond running wget, copied in the brozzler produced warc.gz and zipped the whole thing up. Even with the duplication the zip file is only 15MB

https://drive.google.com/open?id=0B76qh7pWLKB3TF9xTXBuTktBOU0 zip file https://drive.google.com/open?id=0B76qh7pWLKB3UWNtUGJYMnhUTnc zip.gpg file

junosuarez commented 7 years ago

I made some manual librarian-ing progress constructing some metadata for this over at https://github.com/daniellecrobinson/Data-Rescue-PDX/issues/20