gdc_loadfile v0.3.5 - Githubissues

broadinstitute / gdctools

Python and UNIX CLI utilities to simplify interaction with the NIH/NCI Genomics Data Commons

Other

31 stars 4 forks source link

Style Checklist

Please ensure that your pull request meets the following standards for quality. Code should not be merged into the master branch until all of these criteria have been satisfied.

Comments

[x] Each source file includes comments at the top describing its purpose

[x] Each function includes a comment/docstring describing inputs and outputs, and any assumptions it makes

[x] Variable and function names have semantic meaning, and are not reused with a different meaning within the same scope

[x] “Magic” numbers, such index of a particular column name, have a comment describing their value, or are declared as a global constant with a semantic name (e.g. TCGA_ID_COL = 16)

[x] Commented-out code is removed

Style/Execution

[x] Code contains no hard-coded paths

[x] Code contains appropriate logging & or debugging

[x] If possible, input data is validated early in the execution. If not, errors are sufficiently detailed to aid debugging.

[x] Code uses a library (e.g. optparse, argparse) for command-line parsing

I like that the format object was used for this, and that the regression tests were updated. But think the code where it's used is more complex (and less clear) than it could be. For example, this

[1 if prepended else 0]

might be "cute Python" in its use of ternary operators but that is not as simple or clear as something like

[some_well_named_variable_denoting_the_index]

Moreover, directly above the samp_id.split('-')[1 if prepended else 0].endswith('FFPE') is another line which ALSO does samp_id.split('-').

How about doing the split once?

Even better: why not use if 'FFPE' in samp_id or if 'FFPE-' ... to simplify everything? This would remove the need to guessing the 0/1 index, which would thereby get rid of the prepended parameter to write_sset_and_cases() ... as well as the need for the new field in the format object, too.

Lastly, this is really just a temporary thing. Because at some point we're going to start getting real samples in the GDAN, and most of those samples will be retrospectively collected and FFPE preserved.

broadinstitute / gdctools

gdc_loadfile v0.3.5 #66

Style Checklist

Comments

Style/Execution