broadinstitute / gdctools

Python and UNIX CLI utilities to simplify interaction with the NIH/NCI Genomics Data Commons
Other
31 stars 4 forks source link

RFE: tools should be more stringent about when nothing needs to be done #63

Open noblem opened 7 years ago

noblem commented 7 years ago

If there was no new data mirrored, then dicing should automatically detect such and not even bother to traverse the files in the mirror.

Presently, though, the dicer sees each new datestamp as potentially containing new files and therefore initiates the traversal even though every file the dicer inspects will be skipped. This is not only slower than necessary, it's also quite verbose. So, perhaps a middle ground (before the dicer et al can be made smarter) is to change the default so that only newly diced files are printed, unless --verbose is specified.

Similarly, attempting to dice multiple times in a single day would yield essentially the same result: after a dicer successfully processes all new files, then issuing another dice attempt should in principle do nothing because no need data had been downloaded since last dicing.

Similar tactics apply to generating new loadfiles from dicing results, new sample reports from loadfiles: skip whenever possible.