csoneson / alevinQC

Create QC and summary reports for Alevin output
https://csoneson.github.io/alevinQC/
Other
30 stars 6 forks source link

readAlevinFryQC() metadata/explanation #30

Closed BiotechPedro closed 1 year ago

BiotechPedro commented 1 year ago

Hi,

First of all, thank you for this useful package. I have a little request, though.

It would be great if you add another slot to the output object (a list) of readAlevinFryQC(). It could be named as 'elementMetadata', for example, and in it the meaning of every column of the 'cbTable' dataframe would be explained.

Concretely, this comes to my mind after not understanding what is the difference between 'originalFreq' and 'collapsedFreq' (columns from 'cbTable' dataframe). In the report, output from alevinFryQCReport(), it is written that "the cell barcode frequency (the number of observed reads corresponding to a cell barcode)", which corresponds to the 'collapsedFreq' data. I suppose that 'originalFreq' correspond to the uncorrected reads of the cell barcodes. However, as I understand it, the sum(cbTable$originalFreq) should be equal to sum(cbTable$collapsedFreq), but it turns out that the total number of reads is lower for the collapsed barcodes. Could you explain me this?

Also, in the 'summaryTables()' I suppose that the "Number of mapped reads" refers to the reads that map against the reference index by salmon alevin and also has an observed cell barcode, right?

Many thanks!

Pedro

csoneson commented 1 year ago

Thanks for the suggestion. I think this is a good idea - I'll look into adding this information, will get back to you shortly.

BiotechPedro commented 1 year ago

Thanks Charlotte! :D

Currently I am most curious to know where the difference between sum(cbTable$originalFreq) and sum(cbTable$collapsedFreq) is coming from. It would help me to understand some weird sequencing libraries I am dealing with 😅

csoneson commented 1 year ago

Tagging @DongzeHE and @rob-p if they would like to add something, but note that collapsedFreq is only provided (not NA) for cell barcodes in the permit list, while originalFreq is present for all barcodes (if you have an all_freq.bin file). So barcodes that are not collapsed into a barcode from the permit list will not be counted in sum(cbTable$collapsedFreq).

BiotechPedro commented 1 year ago

Many thanks for your help, Charlotte!

I close the issue, but it would be great if you can manage to include the aforementioned descriptions :D

Best,

Pedro

csoneson commented 1 year ago

Yep, it's added to the TODO list. Thanks!