broadinstitute / gdctools

Python and UNIX CLI utilities to simplify interaction with the NIH/NCI Genomics Data Commons
Other
31 stars 4 forks source link

regression tests need to be less dependent upon content of data at GDC #79

Open noblem opened 6 years ago

noblem commented 6 years ago

Compared to 2017 and earlier, the data at GDC are changing rapidly these days; this makes the current approach of the GDCtools regression test suite something of a burden, because it needs to be updated after each new GDC data update.

We can keep the spirit of the regression tests, but discontinue the exhaustive MD5/UUID cross-checking (against our regression baseline data) of the names and content of each file downloaded. Instead, we'll continue to download the data, but only check things like: