climate-mirror / datasets

For tracking data mirroring progress
201 stars 18 forks source link

US Forest Service FIA Dataset #318

Open acostanza opened 7 years ago

acostanza commented 7 years ago

Dataset is mirrored and online

From readme.rtf

ymarcus93 commented 7 years ago

Can you provide a torrent for this?

ymarcus93 commented 7 years ago

Ok. I clicked on the link you provided and downloaded the ZIP called ENTIRE.zip. You said above that the total data is 65.52GB, where are you getting this number? I'm only getting 3.67GB.

ymarcus93 commented 7 years ago

Also, it looks like this data is updated every day, so someone might have to mirror it each day (or weekly).

ghost commented 7 years ago

Keeping downloaded datasets up to date is a tricky problem. Discussions at The Azimuth Backup Project go in a few different directions on this, but they suggest a hapazard approach is likely to set up for a greater mess later.

For the most part, we do not have guidance or input from architects and producers of these datasets so we are assuming we got things right, which may not be the case.

Also, an updating strategy needs to be robust against both reorganization of parts of sites, which happens even under the best of times, and against malicious updates ir alrerations of data, given the context where these projects were started.

As always, others at Azimuth Backup may have different views.

On Sun, Feb 12, 2017, at 10:46, Yuval Marcus wrote:

Also, it looks like this data is updated every day, so someone might have to mirror it each day (or weekly). — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub[1], or mute the thread[2].

Links:

  1. https://github.com/climate-mirror/datasets/issues/318#issuecomment-279226979
  2. https://github.com/notifications/unsubscribe-auth/AD3HB6pzj2iX_z2AX9VgxStuHnxRdXhuks5rbylXgaJpZM4L6SX7
ymarcus93 commented 7 years ago

Ok. Update..I looked at your server and now I know why you have 65.52GB. I think your wget --mirror downloaded duplicates. The website organizes the data into .zip and .csvs. The .zip contains the .csv file. So I think you have duplicates. In total, it should only be 3.67GB.

I'm creating a torrent for this now...

ghost commented 7 years ago

That's entirely possible. However, without the participation of the original developers of the dataset we cannot know. We are not likely to get that. Accordingly, I'm willing to risk duplicates.

On Sun, Feb 12, 2017, at 18:55, Yuval Marcus wrote:

Ok. Update..I looked at your server and now I know why you have 65.52GB. I think your wget --mirror downloaded duplicates. The website organizes the data into .zip and .csvs. The .zip contains the .csv file. So I think you have duplicates. In total, it should only be 3.67GB. — You are receiving this because you commented. Reply to this email directly, view it on GitHub[1], or mute the thread[2].

Links:

  1. https://github.com/climate-mirror/datasets/issues/318#issuecomment-279265104
  2. https://github.com/notifications/unsubscribe-auth/AD3HB9guW3QTdBIFOQLHt_wXVCFpY60vks5rb5wLgaJpZM4L6SX7
ghost commented 7 years ago

I cannot access the mirror.

ymarcus93 commented 7 years ago

Ok, I made a torrent for this. Please let me know if magnet link is working.

Description: U.S. Forest Service's FIADB - Forest plot data. Last updated: Sat 2/11/17. Size: 3.4 GB

Magnet: magnet:?xt=urn:btih:e3c0b7ee13e02ddb532ea11cfe87bc51df315159&dn=FIADB%5FCSV.tar.gz&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80&tr=udp%3A%2F%2Fopen.demonii.com%3A1337&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969

SHA256 Hash: 6696daac2d9bd39f23ee960a961cc6893e4307c4b88cbbf8a4bdb04223ba9e47

gabefair commented 7 years ago

Thanks for adding this dataset. I don't know what the best approach to use to backup http:// sites but I used web archive. web.archive.org/save/<URL> Hope that is ok. http://web.archive.org/web/20170225200809/https://apps.fs.usda.gov/fia/datamart/CSV/datamart_csv.html I had to hand click every link on the page to actually get it included in the archive. I only focused on the .csv links and the zips containing entire state data

You can view all the backed up files here: http://web.archive.org/web/*/https://apps.fs.usda.gov/fia/datamart/CSV/ <-include the at the end

Here is a copy of the "ENTIRE.zip" (3.41 GB) data that was linked to at the bottom of the page: https://www.dropbox.com/s/4ngobsrroumwqbq/ENTIRE.zip?dl=0

gabefair commented 7 years ago

1 vote to close

We have it mirrored four times in this ticket This is also a low priority item as forest innovatory data is duplicated on land-grant university systems nationwide

@nickrsan

gabefair commented 7 years ago

Thanks for all your hard work everyone. Thank you for adding this dataset @acostanza

ymarcus93 commented 7 years ago

We shouldn't close this issue. New data is added weekly.

ymarcus93 commented 7 years ago

My saved data only has data up to 2/11/17. New data on the data set's website has data up to 2/25/17. Someone should setup an automatic sync with it to keep the mirror up to date.

gabefair commented 7 years ago

Ok, I'll reopen it. Thanks for point out that data is being added weekly.