broadinstitute / gdctools

Python and UNIX CLI utilities to simplify interaction with the NIH/NCI Genomics Data Commons
Other
31 stars 4 forks source link

gdc_loadfile v0.3.5 #66

Closed dheiman closed 6 years ago

dheiman commented 6 years ago

Corrected bug in handling FFPE samples when generating loadfiles reported in #65. Tests updated to include FFPE samples.

Style Checklist

Please ensure that your pull request meets the following standards for quality. Code should not be merged into the master branch until all of these criteria have been satisfied.

Comments

Style/Execution

noblem commented 6 years ago

I like that the format object was used for this, and that the regression tests were updated. But think the code where it's used is more complex (and less clear) than it could be. For example, this

[1 if prepended else 0]

might be "cute Python" in its use of ternary operators but that is not as simple or clear as something like

[some_well_named_variable_denoting_the_index]

Moreover, directly above the samp_id.split('-')[1 if prepended else 0].endswith('FFPE') is another line which ALSO does samp_id.split('-').

How about doing the split once?

Even better: why not use if 'FFPE' in samp_id or if 'FFPE-' ... to simplify everything? This would remove the need to guessing the 0/1 index, which would thereby get rid of the prepended parameter to write_sset_and_cases() ... as well as the need for the new field in the format object, too.

Lastly, this is really just a temporary thing. Because at some point we're going to start getting real samples in the GDAN, and most of those samples will be retrospectively collected and FFPE preserved.