broadinstitute / gdctools

Python and UNIX CLI utilities to simplify interaction with the NIH/NCI Genomics Data Commons
Other
31 stars 4 forks source link

gdc_mirror reports success even when some downloads fail #53

Open gsaksena opened 6 years ago

gsaksena commented 6 years ago

If the disk fills in the middle of performing a gdc_mirror, downloads will fail but it keeps trying to move on to the next file. While this is happening, one can clean up the disk or expand the partition, and the downloads will start passing. At the end, gdc_mirror reports success, and does not mention the files in the middle that failed. It should at least report a failure.

As a workaround, if you suspect this issue has occurred, you can rerun the gdc_mirror command, and it will just download the files that were missed earlier.

gsaksena commented 6 years ago

Dicing appears to have a similar issue. However, unlike gdc_mirror, it appears to get tricked by partial files, and will report success leaving a truncated file in place even after redicing.

One way to address this would be to append a .partial to the filename while it is being created, and remove it once the dicing is complete. Cache based on the final filename.

noblem commented 6 years ago

Adding dheiman to this, for watching

gsaksena commented 6 years ago

This is being addressed in the gsaksena_mirror_mirror_dice branch, and discussed further in issue #47.