NCEAS / arcticdatautils

Utility functions in R for processing data for the Arctic Data Center
https://nceas.github.io/arcticdatautils/
Apache License 2.0
10 stars 20 forks source link

Download script #145

Closed dmullen17 closed 4 years ago

dmullen17 commented 4 years ago

@jeanetteclark @amoeba what do you think about adding a helper function that does something similar to this? This is just a script I modify from time to time so it's not very clean.

pkg <- get_package(adc, 'resource_map_doi:10.5065/D61J97TT', file_names = T)
pid_to_wget <- function(pid) {
  return(paste0('wget https://arcticdata.io/metacat/d1/mn/v2/object/', pid, ' -O ', names(pid), '\n'))
}

pids <- c(pkg$metadata, pkg$data)
text <- vector('character', length(pids))
for (i in seq_along(pids)) {
  text[i] <- pid_to_wget(pids[i])
}
formatted_text <- paste0(text, collapse = '')
write(formatted_text, file = '/home/dmullen/Submissions/Merrelli/download.sh')
amoeba commented 4 years ago

I saw the RT ticket that probably prompted this. I think it's a good script to have somewhere since it comes up every once in a while. It's a little more general than arcticdatautils though, could datamgmt be place for it? I'm not overly picky here.

As for what the script does, it looks good.

dmullen17 commented 4 years ago

Yea datamgmt seems like a better place for it.

As kind of an aside each call to wget throws an authentication error that doesn't end up interfering but is annoying. So i decided to add the --no-check-certificate argument. It's increased the download time by orders of magnitude. With --no-check-certificate it took real 0m29.750s. I'm still waiting on the time to return for the other case but it's been over 30 minutes.

amoeba commented 4 years ago

Hrm, weird. Let me give it a go here and see. Maybe we have a config issue.

amoeba commented 4 years ago

Which version of wget are you using? I don't get any errors or warnings. I'm on GNU Wget 1.20.3.

dmullen17 commented 4 years ago

Just updated to Wget 1.20.3. Upon further inspection the file sizes are all 0 bytes if you don't include --no-check-certificate 🤦‍♂

amoeba commented 4 years ago

Weird, no issues here. My script has lines like:

wget https://arcticdata.io/metacat/d1/mn/v2/object/doi:10.5065/D61J97TT -O science_metadata.xml
wget https://arcticdata.io/metacat/d1/mn/v2/object/urn:uuid:3ff4952e-4a55-4a5a-83e8-1bb7c29ff686 -O simulation_2012012006_cld1_0005_ch2.nc
wget https://arcticdata.io/metacat/d1/mn/v2/object/urn:uuid:8f6260c8-24fc-4881-b7da-710b52b6d2d2 -O simulation_2012090721_cld2_0006_ch1.nc

and all my files are sized reasonably.

dmullen17 commented 4 years ago

Yea the same wget calls worked for the PI that I sent the script to as well. I think we should include --no-check-certificate` just to be safe.