aryarm / varCA

Use an ensemble of variant callers to call variants from ATAC-seq data
MIT License
23 stars 7 forks source link

varscan precision recall curve #3

Closed aryarm closed 5 years ago

aryarm commented 5 years ago

Approximately 1.5 weeks ago, I got a varscan plot that looked like this: gatk-varscan I don't remember what state the code was in when I created this plot nor do I remember the parameters I passed to the code that creates the plots. Then, I made changes to the code that produces the plots and got this instead: agh-varscan8 We think the first one is correct because it passed through the single point precision-recall calculation at its inflection point (as both gatk and vardict do currently). So what went wrong with the varscan plot making?

Well, one of the changes I made to the plot creation code affected how I was labeling non-variants. I'm not sure what I was doing before, but I don't think non-variants were being given the correct PVAL. Neither do I think this was a breaking change. I checked, and mislabeling the non-variants doesn't give the plot I had before.

Another thing I tried to consider is the precision with which non-variants are being read into python. I'm currently using a float64, which should be more than enough space to store each PVAL, but I might have been using something of smaller precision before. Unfortunately, using less precision didn't return the plots to the way they were either.

aryarm commented 5 years ago

I tried using GQ instead of PVAL and got a sad plot where the curve was basically flush with the x and y axes. GQ was a field that worked 1.5 weeks ago.

At this point, I started to suspect that the code had somehow messed up the truth column. But the plot creation code still worked on gatk, so how could it only be messed up for varscan?

Update: further research shows the GQ score to have been capped at 99 in 5% of the variants since the last time I ran the varscan caller. I think GATK SelectVariants is at fault. It's a shame, but I don't think it's important enough to fix. Also, I doubt that this could be responsible for the lack of a plot.

aryarm commented 5 years ago

Update (8/18/19): After trying everything I could think to fix this issue, we've decided that it may be possible that the varscan curve doesn't have to pass through its single point calculation. At least, that's the current explanation for why we are seeing this. I'll leave this issue open if we ever find something else that might explain the problem.

aryarm commented 5 years ago

Closing this now for now, since we haven't found any other explanation.