broadinstitute / gdctools

Python and UNIX CLI utilities to simplify interaction with the NIH/NCI Genomics Data Commons
Other
31 stars 4 forks source link

Mirror/Dice version stamp out of sync? #8

Closed noblem closed 7 years ago

noblem commented 7 years ago

Compare the vesion stamp on the log files generated by the mirror

*% ls -l /xchip/gdac_data/gdc/logs/mirror/gdcMirror.2016_11 | awk '{print $NF}'**

...snip... /xchip/gdac_data/gdc/logs/mirror/gdcMirror.2016_11_0301_00_02.log /xchip/gdac_data/gdc/logs/mirror/gdcMirror.2016_11_05__01_00_04.log /xchip/gdac_data/gdc/logs/mirror/gdcMirror.2016_11_0601_00_05.log /xchip/gdac_data/gdc/logs/mirror/gdcMirror.2016_11_0701_00_02.log /xchip/gdac_data/gdc/logs/mirror/gdcMirror.2016_11_08__01_00_05.log /xchip/gdac_data/gdc/logs/mirror/gdcMirror.2016_11_0901_00_02.log /xchip/gdac_data/gdc/logs/mirror/gdcMirror.2016_11_10__01_00_07.log

with those generated by the dicer

*% ls -l /xchip/gdac_data/gdc/logs/dice/gdcDice.2016_11 | awk '{print $NF}'** /xchip/gdac_data/gdc/logs/dice/gdcDice.2016_11_0411_35_46.log /xchip/gdac_data/gdc/logs/dice/gdcDice.2016_11_04__11_37_19.log /xchip/gdac_data/gdc/logs/dice/gdcDice.2016_11_0701_10_11.log /xchip/gdac_data/gdc/logs/dice/gdcDice.2016_11_0801_09_42.log /xchip/gdac_data/gdc/logs/dice/gdcDice.2016_11_09__01_08_16.log /xchip/gdac_data/gdc/logs/dice/gdcDice.2016_11_1001_07_19.log

Notice that for the 2016_11_03 mirror there is no 2016_11_03 dicing log. This is a logical problem: a dicing version stamp should always correspond to a mirror version stamp (and ditto for eventual load file generation). Otherwise, how do we associate them, e.g. when doing data forensics?

I'm also thinking that perhaps we drop the timestamp altogether from the generated filenames, and instead just do .1, .2, .3 if we wind up attempting more than 1 mirror/dice/loadfile etc on any given YYYY_MM_DD. But this is not as important as the version stamps matching.