pair_segments.py: Documentation vs code

Hi @etal,

Congratulations on publishing CNVkit, well-earned!

I have a question regarding the little script pair_segments.py.

Because the docstring

def read_paired_genes(cbs1, cbs2, interval):
    """Get the segment CN values for each targeted region.

    Get overlapping regions of two paired segment/gene sets.

    For genes with 2 or more segments, take the longest segment (or [weighted]
    average).

is at odds with the code in the called function, because it actually returns the median?


def segment_cn(segset):
    """Get or estimate the segment's copy number.
    If the segment set only contains one row (i.e. segment), just return that
    value. Otherwise, return the average of the segment log2 values.
    """
    if len(segset) == 0:
        return np.nan
    elif len(segset) == 1:
        return segset.log2[0]
    else:
        # Median
        return segset.log2.median()

I guess this script also generated the data for Figure 6 in the publication, correct? http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004873#pcbi-1004873-g006 Or have there been any modifications? From the text it's not quite clear to me.

I would appreciate if you would clear this up.

etal / cnvkit-examples

pair_segments.py: Documentation vs code #2