fpbarthel / GLASS

GLASS consortium
MIT License
37 stars 13 forks source link

get difference between two seg files #108

Closed fpbarthel closed 5 years ago

fpbarthel commented 5 years ago

Using postgres

https://rextester.com/GZS12079

CREATE TABLE segs (sample varchar(1), chrom varchar(2), pos int4range, minor_cn smallint, major_cn smallint, cn double precision);
CREATE TABLE pairs (pair varchar(2), sample_a varchar(1), sample_b varchar(1));

INSERT INTO segs (sample, chrom, pos, minor_cn, major_cn, cn) VALUES 
('A', '1', '[1,10]', 1, 1, 1.9),
('A', '1', '[11,20]', 1, 2, 3.2),
('A', '1', '[21,30]', 1, 1, 2),
('B', '1', '[1,15]', 1, 1, 2.1),
('B', '1', '[16,25]', 1, 2, 3.3),
('B', '1', '[26,30]', 1, 1, 2.2),
('C', '1', '[1,2]', 2, 3, 4.9),
('C', '1', '[3,15]', 1, 2, 3.2),
('C', '1', '[16,21]', 1, 0, 1.1),
('C', '1', '[22,26]', 0, 0, 0.1),
('C', '1', '[27,30]', 1, 1, 2),
('D', '1', '[1,15]', 1, 1, 1.8),
('D', '1', '[16,30]', 1, 0, 1.2);

INSERT INTO pairs (pair, sample_a, sample_b) VALUES
('AB','A','B'),
('CD','C','D');

SELECT *, s1.pos * s2.pos AS intersect, s2.minor_cn - s1.minor_cn AS delta_minor, s2.major_cn - s1.major_cn AS delta_major, (s1.cn::decimal + s2.cn)/2 AS avg_cn
FROM pairs pa
LEFT JOIN segs s1 ON s1.sample = pa.sample_a
LEFT JOIN segs s2 ON s2.sample = pa.sample_b AND s1.chrom = s2.chrom AND s1.pos && s2.pos
fpbarthel commented 5 years ago

Applied to our dataset:

/*
For each `tumor_pair_barcode`, consisting of `tumor_barcode_a` (a) and `tumor_barcode_b` (b) in the `tumor_pairs` table:
- Intersect all segments from (a) and (b) --> `pos_intersect`
- Determine change in `minor_cn` between (a) and (b) as (b-a)
- Determine change in `major_cn` between (a) and (b) as (b-a)
    * For the two items above, a value of 0 indicates no change, positive numbers indicate (b > a) and negative numbers (a > b)
- Determine the difference in log2(tumor/normal) copy number between (a) and (b) as (b-a)
*/
SELECT
    pa.tumor_pair_barcode, 
    pa.case_barcode, 
    pa.tumor_barcode_a, 
    pa.tumor_barcode_b, 
    s1.pos * s2.pos AS pos_intersect, 
    s2.minor_cn - s1.minor_cn AS delta_minor, 
    s2.major_cn - s1.major_cn AS delta_major, 
    s2.logr_copy_number - s1.logr_copy_number AS delta_cn
FROM analysis.tumor_pairs pa
LEFT JOIN analysis.pairs p1 ON p1.tumor_barcode = pa.tumor_barcode_a
LEFT JOIN analysis.pairs p2 ON p2.tumor_barcode = pa.tumor_barcode_b
LEFT JOIN analysis.titan_seg s1 ON s1.pair_barcode = p1.pair_barcode
LEFT JOIN analysis.titan_seg s2 ON s2.pair_barcode = p2.pair_barcode AND s1.chrom = s2.chrom AND s1.pos && s2.pos

Found at SELECT * FROM analysis.titan_seg_paired_delta

fpbarthel commented 5 years ago

We can quantify these differences as follows:

/*
For each `tumor_pair_barcode` with paired segmentation data in the `titan_seg_paired_delta` table:
- Quantify the sum of all segment sizes (this should roughly add up to the total genome size)
- Quantify the sum of all segments that did not change in call between (a) and (b) as `delta_eq_size`
- Quantify the proportion of segments that did not change between (a) and (b)
    * a proportion of 1 means that the copy number profile of (a) and (b) are equal
    * a proportion of 0 means that there is no overlap in copy number profile between (a) and (b)

In addition: for each `tumor_pair_barcode` in the `tumor_pairs` table:
- Compare number of segments between (a) and (b)
- Compare proportion of genome that is heterozyous between (a) and (b)
    * a difference of -1 means that (a) is largely heterozygous, whereas (b) is entirely non-heterozygous
    * a difference of 0 indicates (a) and (b) have equal sized heterozygous region
    * a difference of +1 indicates that (b) is heterozygous, whereas (a) is not

Now updated to subset using `selected_tumor_pairs` to filter based on a platinum set including optimal CNV samples only
*/
WITH
selected_tumor_pairs AS
(
    SELECT
        tumor_pair_barcode,
        row_number() OVER (PARTITION BY case_barcode ORDER BY surgical_interval_mo DESC, portion_a ASC, portion_b ASC, substring(tumor_pair_barcode from 27 for 3) ASC) AS priority
    FROM analysis.tumor_pairs ps
    LEFT JOIN analysis.blocklist b1 ON b1.aliquot_barcode = ps.tumor_barcode_a
    LEFT JOIN analysis.blocklist b2 ON b2.aliquot_barcode = ps.tumor_barcode_b
    WHERE
        comparison_type = 'longitudinal' AND
        sample_type_b <> 'M1' AND                                                   -- exclude metastatic samples here because this is outside the scope of our study
        b1.cnv_exclusion = 'allow' AND b2.cnv_exclusion = 'allow' AND
        b1.coverage_exclusion = 'allow' AND b2.coverage_exclusion = 'allow'
),
paired_seg_quant AS
(
    SELECT
        tumor_pair_barcode,
        tumor_barcode_a,
        tumor_barcode_b,
        sum(upper(pos_intersect) - lower(pos_intersect)) AS delta_seg_size,
        sum(CASE WHEN delta_minor = 0 AND delta_major = 0 THEN upper(pos_intersect) - lower(pos_intersect) ELSE 0 END) AS delta_eq_size
    FROM analysis.titan_seg_paired_delta
    GROUP BY tumor_pair_barcode, tumor_barcode_a, tumor_barcode_b
)
SELECT
    pq.tumor_pair_barcode,
    tumor_barcode_a,
    tumor_barcode_b,
    s1.num_seg AS num_seg_a,
    s2.num_seg AS num_seg_b,
    s1.prop_het AS prop_het_a,
    s2.prop_het AS prop_het_b,
    s2.num_seg - s1.num_seg AS delta_num_seg,
    s2.prop_het - s1.prop_het AS delta_prop_het,
    delta_eq_size,
    delta_seg_size,
    round(delta_eq_size::decimal/delta_seg_size,4) AS prop_delta_eq
FROM paired_seg_quant pq
INNER JOIN selected_tumor_pairs stp ON stp.tumor_pair_barcode = pq.tumor_pair_barcode AND stp.priority = 1
LEFT JOIN analysis.pairs p1 ON p1.tumor_barcode = pq.tumor_barcode_a
LEFT JOIN analysis.pairs p2 ON p2.tumor_barcode = pq.tumor_barcode_b
LEFT JOIN analysis.titan_seg_prop_het s1 ON s1.pair_barcode = p1.pair_barcode
LEFT JOIN analysis.titan_seg_prop_het s2 ON s2.pair_barcode = p2.pair_barcode
ORDER BY 9 -- or: 12

Results can be found in SELECT * FROM analysis.titan_seg_paired_comparison

fpbarthel commented 5 years ago

Closing this issue