climate-mirror / datasets

For tracking data mirroring progress
201 stars 18 forks source link

NASS Cropland Data Layer #39

Open nickrsan opened 7 years ago

nickrsan commented 7 years ago

Name: NASS Cropland Data Layer Organization: USDA-NASS Description URL: https://nassgeodata.gmu.edu/CropScape/ Download URL: File Types: Size: Status:

ghost commented 7 years ago

The link is not working per my ISP and Is It Down Right Now. http://www.isitdownrightnow.com/nassgeodata.gmu.edu.html

nickrsan commented 7 years ago

It seems to be up now if you'd like to try again. Thank you!

alex-kazda commented 7 years ago

I will try to get this data, but it might be a tricky dataset to get. So far, I could only find the data as TIFF images to be downloaded from a web interface, one picture for each year + some dBase database of crops.

I'm new to archiving and it will probably take me several days to get the data. I estimate the size of the data (uncompressed) to be about 6GB per year, so about 120 GB for the 20 year dataset.

Metadata (reasonably small) should be here https://www.nass.usda.gov/Research_and_Science/Cropland/metadata/meta.php

EDITED: @mxplusb is there a mirror of this somewhere, so that I actually don't need to do the downloading the dataset?

gabefair commented 7 years ago

@alex-kazda were you able to download a copy or do this dataset need help?

alex-kazda commented 7 years ago

@gabefair I did not download it yet. I had a preliminary look at the database interface, downloading some small samples. I'm in Austria and need to go to sleep now, so if you think that this needs doing more quickly, then you can start downloading the data. I recommend clicking on the little red-white-blue outline of the US and select the regions to be downloaded on a state by state basis (that is the best partition of the data that I could find so far -- the web interface did not let me to select the whole USA and states seem like reasonable units).

rustyguts commented 7 years ago

You can download the entire data set per year. Should be about 1.5gb per year. Is that all the data?

https://nassgeodata.gmu.edu/nass_data_cache/tar/2016_cdls.tar.gz (replace 2016 with year you want to download)

RoboDonut commented 7 years ago

Can I mirror from a public s3 bucket?

nickrsan commented 7 years ago

Hi @RoboDonut - nice to see you here! Yes, a public S3 bucket is totally fine - whatever tech makes the most sense for you - lots of people are using S3 and posting the URLs back here. Trying to line up additional storage now too.

rustyguts commented 7 years ago

If you post s3 URLs I will mirror to Google drive

On Wed, Jan 25, 2017, 23:44 Nick notifications@github.com wrote:

Hi @RoboDonut https://github.com/RoboDonut - nice to see you here! Yes, a public S3 bucket is totally fine - whatever tech makes the most sense for you - lots of people are using S3 and posting the URLs back here. Trying to line up additional storage now too.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/climate-mirror/datasets/issues/39#issuecomment-275306898, or mute the thread https://github.com/notifications/unsubscribe-auth/AIy35kbUH_-XKwVpl249df8ld-ZImIYjks5rWCSggaJpZM4LhVBd .

RoboDonut commented 7 years ago

Right on @nickrsan, glad to be here and hope to contribute. I'll start with this data tonight.

RoboDonut commented 7 years ago

Weeeee!


from os.path import join
from requests.packages.urllib3.exceptions import InsecureRequestWarning

requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

def download_file(url, dl_dir):
    local_filename =  join(dl_dir,url.split('/')[-1])
    # NOTE the stream=True parameter
    r = requests.get(url, stream=True, verify=False)
    with open(local_filename, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024): 
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)
                #f.flush() commented by recommendation from J.F.Sebastian
    return local_filename

files= ['http://nassgeodata.gmu.edu/nass_data_cache/tar/1996_cdls.tar.gz',
        'http://nassgeodata.gmu.edu/nass_data_cache/tar/1997_cdls.tar.gz',
        'http://nassgeodata.gmu.edu/nass_data_cache/tar/1998_cdls.tar.gz',
        'http://nassgeodata.gmu.edu/nass_data_cache/tar/1999_cdls.tar.gz',
        'http://nassgeodata.gmu.edu/nass_data_cache/tar/2000_cdls.tar.gz',
        'http://nassgeodata.gmu.edu/nass_data_cache/tar/2001_cdls.tar.gz',
        'http://nassgeodata.gmu.edu/nass_data_cache/tar/2002_cdls.tar.gz',
        'http://nassgeodata.gmu.edu/nass_data_cache/tar/2003_cdls.tar.gz',
        'http://nassgeodata.gmu.edu/nass_data_cache/tar/2004_cdls.tar.gz',
        'http://nassgeodata.gmu.edu/nass_data_cache/tar/2005_cdls.tar.gz',
        'http://nassgeodata.gmu.edu/nass_data_cache/tar/2006_cdls.tar.gz',
        'http://nassgeodata.gmu.edu/nass_data_cache/tar/2007_cdls.tar.gz',
        'http://nassgeodata.gmu.edu/nass_data_cache/tar/2008_cdls.tar.gz',
        'http://nassgeodata.gmu.edu/nass_data_cache/tar/2009_cdls.tar.gz',
        'http://nassgeodata.gmu.edu/nass_data_cache/tar/2010_cdls.tar.gz',
        'http://nassgeodata.gmu.edu/nass_data_cache/tar/2011_cdls.tar.gz',
        'http://nassgeodata.gmu.edu/nass_data_cache/tar/2012_cdls.tar.gz',
        'http://nassgeodata.gmu.edu/nass_data_cache/tar/2013_cdls.tar.gz',
        'http://nassgeodata.gmu.edu/nass_data_cache/tar/2014_cdls.tar.gz',
        'http://nassgeodata.gmu.edu/nass_data_cache/tar/2015_cdls.tar.gz']

download_directory = r"C:\NASS"
for f in files:
    print "Downloading: {0}".format(f)
    download_file(f,download_directory)```
RoboDonut commented 7 years ago

uploading slowly to

http://s3-external-1.amazonaws.com/nass-mirror

alex-kazda commented 7 years ago

@RustyGuts thanks for the URL and @RoboDonut thanks for the backup. Since you have the Data Layer in hand, I will at least download the metadata at https://www.nass.usda.gov/Research_and_Science/Cropland/metadata/meta.php

alex-kazda commented 7 years ago
#!/bin/sh

wget https://www.nass.usda.gov/Research_and_Science/Cropland/metadata/XMLs_1997-1999.zip

for i in `seq -w 0 15`; do 
   sleep 5
   wget https://www.nass.usda.gov/Research_and_Science/Cropland/metadata/XMLs_20$i.zip
done

wget https://www.nass.usda.gov/Research_and_Science/Cropland/metadata/2015_cultivated_layer_metadata.php
sleep 5
wget https://www.nass.usda.gov/Research_and_Science/Cropland/metadata/crop_frequency_2015_metadata.php
alex-kazda commented 7 years ago

The collected metadata (about 6MB) are at http://atrey.karlin.mff.cuni.cz/~alexak/dokumenty/USDA_Cropland_Data_Layer_Metadata.zip for now (this sharing method does not scale well, so please attach them to what you got).