etal / cnvkit-examples

Example datasets for CNVkit (http://github.com/etal/cnvkit)
19 stars 5 forks source link

pair_segments.py: Documentation vs code #2

Closed messersc closed 7 years ago

messersc commented 8 years ago

Hi @etal,

Congratulations on publishing CNVkit, well-earned!

I have a question regarding the little script pair_segments.py.

Because the docstring

def read_paired_genes(cbs1, cbs2, interval):
    """Get the segment CN values for each targeted region.

    Get overlapping regions of two paired segment/gene sets.

    For genes with 2 or more segments, take the longest segment (or [weighted]
    average).

is at odds with the code in the called function, because it actually returns the median?


def segment_cn(segset):
    """Get or estimate the segment's copy number.
    If the segment set only contains one row (i.e. segment), just return that
    value. Otherwise, return the average of the segment log2 values.
    """
    if len(segset) == 0:
        return np.nan
    elif len(segset) == 1:
        return segset.log2[0]
    else:
        # Median
        return segset.log2.median()

I guess this script also generated the data for Figure 6 in the publication, correct? http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004873#pcbi-1004873-g006 Or have there been any modifications? From the text it's not quite clear to me.

I would appreciate if you would clear this up.

etal commented 8 years ago

Yes, trust the code, not the docstring. You can see in the commit logs (e.g. 2db7b595) that this changed, but I didn't update the docstring in the caller. Weighting the segments correctly after slicing turned out to be tricky, so I settled on median to generate the final comparison metrics and generate Figure 6 for the paper.

If you run make in each of the subdirectories, and then in compare/, you should eventually get an equivalent of Figure 6 for your currently installed version of CNVkit.