csoneson / alevinQC

Create QC and summary reports for Alevin output
https://csoneson.github.io/alevinQC/
Other
30 stars 6 forks source link

nbrMappedUMI for barcode collapsing question #26

Closed me-orlov closed 1 year ago

me-orlov commented 1 year ago

Hello, I have been playing with alevinQC using a test data set, and noticed that when 1) I run the alevinFryQCShiny and 2) manually plot the Barcode collapsing with plotAlevinBarcodeCollapse, the two resulting graphs are different. In the shiny application, it looks like the graph is set to treat nbrMappedUMI (corresponding to mapped reads from the featureDump.txt file) as the dependent variable for the cell barcode frequency following reassignment. In the manual plot, the collapsedFreq is used instead. Is there a particular reason why the # of mapped reads is plotted instead of the collapsedFreq in the shiny app? I am very new to rna-seq and would appreciate any guidance. Thank you for your time!

csoneson commented 1 year ago

Thanks for your question. For alevin-fry, we use the nbrMappedUMI as it is more comparable to the reported original frequency (the x-axis in the plot). For alevin-fry, the latter only includes reads that map properly. Thus, comparing to collapsedFreq (which does include the unmapped reads) can be misleading. For alevin, both values include unmapped reads.

Thus, in both the html report (alevinFryQCReport) and the shiny app (alevinFryQCShiny) we set the values of the arguments to the plot functions in a way that is adapted to the input type: https://github.com/csoneson/alevinQC/blob/master/R/alevinQCShiny.R#L90 https://github.com/csoneson/alevinQC/blob/master/inst/extdata/alevin_report_template.Rmd#L188

The basic plotting functions (e.g. plotAlevinBarcodeCollapse) are general utilities that work for any input type. The default argument is set to collapsedFreq, but the user can specify any column to be used here.

me-orlov commented 1 year ago

Hello and thank you for the detailed response! I hadn't realized the original frequency for alevin fry (vs. alevin) includes only mapped reads. Everything makes more sense now. Thank you for taking the time to explain it.