chanzuckerberg / idseq-workflows

Portable WDL workflows for IDseq production pipelines
https://idseq.net/
MIT License
31 stars 12 forks source link

Add utility functions to read truth files, compute AUPR #79

Closed kislyuk closed 3 years ago

kislyuk commented 3 years ago

We will also need a strategy for combining read counts from nr and nt for the purpose of abundance estimate reporting (so far in this PR I have only used nt counts). @katrinakalantar could you advise on the best way to do this?

katrinakalantar commented 3 years ago

Good question. Here are a few general thoughts on this. I can see a few ways of doing this, each with pros/cons...

  1. Compute metrics for NT and NR separately - given that algorithmic changes specific to the alignment databases will only affect single metrics, there may be utility to track the metrics separately. Of course, this doesn't reflect the value of combining the metrics.
  2. Apply a minimal threshold filter to require concordance on NT and NR (specifically, require NT > 0 and NR > 0 for any particular taxa). Then, use the NT counts. This is an approach that many users take when actually interpreting the data in the IDseq reports to improve specificity. However, it does not enable detection of divergent taxa where counts on NR alone may be of interest. It also does not reflect "raw results".
  3. Apply the algorithm for combining NT/NR that was implemented here: https://github.com/chanzuckerberg/idseq-workflows/pull/51. This is the most rigorous method of combining NT and NR because it gives one taxID to each read. However, it is under active refinement to improve performance for a small number of divergent viral edge cases and is not currently used for presentation of production results.

Another more naive approach that I'm not certain about would be to sum together the NT and NR counts - unfortunately this means that single reads would potentially be counted twice. This does allow you to inherit the sensitivity from each, but will take a hit on perceived specificity.

mlin commented 3 years ago

Built on this in #83