Open baron-de-montblanc opened 1 month ago
Hey Jade, That must be frustrating. We have a little bit of retry / error handling logic in giant-squid, but it's clearly not doing its job.
In the meantime, here's how you can use wget
to handle the download instead.
giant-squid list --json $query
will give you a bunch of metadata about the jobs matching $query
including a download link.
{
"801409":{
"obsid":1413666792,
"jobId":801409,
"jobType":"DownloadVisibilities",
"jobState":"Ready",
"files":[
{
"jobType":"Acacia",
"fileUrl":"https://projects.pawsey.org.au/mwa-asvo/1413666792_801409_vis.tar?AWSAccessKeyId=...",
"filePath":null,
"fileSize":152505477120,
"fileHash":"d6dfb7391a495b0eb07cc885808e9e8058e90ec3"
}
]
}
}
you can chuck fileUrl
straight into wget, which has a lot of options around retrying downloads. I use --wait=60 --random-wait
If you want to automated this for many jobs you can use jq
, e.g.
giant-squid list -j --states=ready -- $obslist \
| jq -r '.[]|[.jobId,.files[0].fileUrl//"",.files[0].fileSize//"",.files[0].fileHash//""]|@tsv' \
| while read -r jobid url size hash; do
[ -f ${obsid}.tar ] && continue
wget $url -O${obsid}.tar --progress=dot:giga --wait=60 --random-wait
done
Hi Jade,
As Dev says, we currently don't have a continue-from-where-you-left-off feature as such, but it would be extremely valuable especially for large downloads. So it will definitely be on our roadmap for a future release.
In the meantime, I think Dev has used the above technique successfully, so please give that a go and let us know how it goes!
oh and @baron-de-montblanc @d3v-null - FYI you can also pass to wget
:
-c, --continue
to "resume getting a partially-downloaded file"
I only just found it and does appear to work quite nicely!
Hello, I am trying to download some rather large observations from ASVO to our group's supercomputer through giant-squid. It is very common for the download to fail (see attached screenshot for example), probably due to the connection getting interrupted.
My question is, is there an option/flag one can use with giant-squid to tell it to resume the download from where it crashed? (Or, alternatively, how could I successfully download these ~50Gb observations without it crashing?)