broadinstitute / gdctools

Python and UNIX CLI utilities to simplify interaction with the NIH/NCI Genomics Data Commons
Other
31 stars 4 forks source link

Gsaksena methyl #41

Closed gsaksena closed 7 years ago

gsaksena commented 7 years ago

Pull Request Template

Describe your changes here.

Fix how GDC methylation files are diced.

Style Checklist

Please ensure that your pull request meets the following standards for quality. Code should not be merged into the master branch until all of these criteria have been satisfied.

Comments

Style/Execution

gsaksena commented 7 years ago

Question - is it really cleaner to pull the conditional for column reordering up a level?

dheiman commented 7 years ago

Before:

def rearrange_columns(csvfile, col_scramble=None):
    """
    col_scramble is a list whose index is for the column in the output file and whose values 
    represent column numbers of the input file.   Eg 0,2,3,4,1 will move column 1 of the input file to be after the others.  If col_scramble is None, column order is left unchanged.
    """
    if col_scramble:
        len_expected = len(col_scramble)
    else:
        len_expected = None
    for row in csvfile:
        if col_scramble:
            if len(row) != len_expected:
                raise Exception('Unexpected number of columns, expected %d but found %d'%(len_expected,len(row)))
            new_row=[row[i] for i in col_scramble]
        else:
            new_row=row
        yield new_row

After:

def rearrange_columns(csvfile, col_reorder):
"""
csvfile : iterable of list
col_reorder : list of int
    Values are the final column ordering. E.g.: [0, 2, 3, 1] will cause column 1
    of the input to be column 3 of the output.
"""
    min_expected = max(col_reorder) + 1
    for row in csvfile:
        if len(row) < min_expected:
                raise Exception('Unexpected number of columns, expected at least %d but found %d' %
 (min_expected, len(row)))
        yield [row[i] for i in col_reorder]
gsaksena commented 7 years ago

Yes, the lower level function is longer the way I did it, and the top level function is shorter and more linear.

gsaksena commented 7 years ago

Thanks for your careful reading of the code and well thought out comments... they have improved this code submission. I am not planning to make further changes.