ImSoErgodic / py-upset

A pure-python implementation of the UpSet suite of visualisation methods by Lex, Gehlenborg et al.
313 stars 57 forks source link

Bug? Numbers don't seem to add up #30

Open PikalaxALT opened 5 years ago

PikalaxALT commented 5 years ago

I have a set of three TSV files which I am reading as pandas.DataFrames. Because the data are being prepared for a manuscript in review, I will not share them here. I hope that my description of these files is sufficient to track down the problem.

Basically, I am looking to perform an upset of genes with significant detections of splicing QTLs between tissues. Genes can have multiple splicing QTLs associated with them (multiple splicing events, multiple genomic variants). In one such test, I observe total gene counts on the order of 5-6e4 for each of 3 tissues. However, the intersection of all 3 is on the order of 2e6. This brings into doubt the assumption that the intersections are being computed correctly. My guess is that the intersection does not properly filter for unique intersecting rows.