Open acostanza opened 7 years ago
Can you provide a torrent for this?
Ok. I clicked on the link you provided and downloaded the ZIP called ENTIRE.zip. You said above that the total data is 65.52GB, where are you getting this number? I'm only getting 3.67GB.
Also, it looks like this data is updated every day, so someone might have to mirror it each day (or weekly).
Keeping downloaded datasets up to date is a tricky problem. Discussions at The Azimuth Backup Project go in a few different directions on this, but they suggest a hapazard approach is likely to set up for a greater mess later.
For the most part, we do not have guidance or input from architects and producers of these datasets so we are assuming we got things right, which may not be the case.
Also, an updating strategy needs to be robust against both reorganization of parts of sites, which happens even under the best of times, and against malicious updates ir alrerations of data, given the context where these projects were started.
As always, others at Azimuth Backup may have different views.
On Sun, Feb 12, 2017, at 10:46, Yuval Marcus wrote:
Also, it looks like this data is updated every day, so someone might have to mirror it each day (or weekly). — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub[1], or mute the thread[2].
Links:
Ok. Update..I looked at your server and now I know why you have 65.52GB. I think your wget --mirror downloaded duplicates. The website organizes the data into .zip and .csvs. The .zip contains the .csv file. So I think you have duplicates. In total, it should only be 3.67GB.
I'm creating a torrent for this now...
That's entirely possible. However, without the participation of the original developers of the dataset we cannot know. We are not likely to get that. Accordingly, I'm willing to risk duplicates.
On Sun, Feb 12, 2017, at 18:55, Yuval Marcus wrote:
Ok. Update..I looked at your server and now I know why you have 65.52GB. I think your wget --mirror downloaded duplicates. The website organizes the data into .zip and .csvs. The .zip contains the .csv file. So I think you have duplicates. In total, it should only be 3.67GB. — You are receiving this because you commented. Reply to this email directly, view it on GitHub[1], or mute the thread[2].
Links:
I cannot access the mirror.
Ok, I made a torrent for this. Please let me know if magnet link is working.
Description: U.S. Forest Service's FIADB - Forest plot data. Last updated: Sat 2/11/17. Size: 3.4 GB
Magnet: magnet:?xt=urn:btih:e3c0b7ee13e02ddb532ea11cfe87bc51df315159&dn=FIADB%5FCSV.tar.gz&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80&tr=udp%3A%2F%2Fopen.demonii.com%3A1337&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969
SHA256 Hash: 6696daac2d9bd39f23ee960a961cc6893e4307c4b88cbbf8a4bdb04223ba9e47
Thanks for adding this dataset. I don't know what the best approach to use to backup http:// sites but I used web archive. web.archive.org/save/<URL>
Hope that is ok.
http://web.archive.org/web/20170225200809/https://apps.fs.usda.gov/fia/datamart/CSV/datamart_csv.html
I had to hand click every link on the page to actually get it included in the archive. I only focused on the .csv links and the zips containing entire state data
You can view all the backed up files here: http://web.archive.org/web/*/https://apps.fs.usda.gov/fia/datamart/CSV/ <-include the at the end
Here is a copy of the "ENTIRE.zip" (3.41 GB) data that was linked to at the bottom of the page: https://www.dropbox.com/s/4ngobsrroumwqbq/ENTIRE.zip?dl=0
We have it mirrored four times in this ticket This is also a low priority item as forest innovatory data is duplicated on land-grant university systems nationwide
@nickrsan
Thanks for all your hard work everyone. Thank you for adding this dataset @acostanza
We shouldn't close this issue. New data is added weekly.
My saved data only has data up to 2/11/17. New data on the data set's website has data up to 2/25/17. Someone should setup an automatic sync with it to keep the mirror up to date.
Ok, I'll reopen it. Thanks for point out that data is being added weekly.
Dataset is mirrored and online
From readme.rtf