alexdobin / STAR

RNA-seq aligner
MIT License
1.84k stars 504 forks source link

Visualizing QC stats from STARSolo #1158

Closed bapoorva closed 3 years ago

bapoorva commented 3 years ago

Hi,

The new version of STARSolo perfectly mimics the cellranger output. But I was wondering if there was a way to get QC stats like we do from cell ranger.

This is sort of addressed here #660 . I have the summary.csv and UMIperCellSorted.txt. But i'm interested in the barcode vs umi plot with different colors for cell and background as in the web_summary.html file from cell ranger. The UMIperCellSorted.txt, just has the UMI and there is no way to tell if it is a cell or background. Any way to work around it ?

Thanks Apoorva

alexdobin commented 3 years ago

Hi Apoorva,

you can calculate the number of UMIs per cell for the filtered data by summing the 3rd column in the filtered/matrix.mtx file for each of the cell barcode indexes in column 2 (skipping 3 header lines). Then you can plot it on the same plot with the UMIperCellSorted.txt with different colors - this should reproduce the cell/background plot.

I also added an awk script: https://github.com/alexdobin/STAR/blob/master/extras/scripts/calcUMIperCell.awk usage:

awk -f calcUMIperCell.awk raw/matrix.mtx raw/barcodes.tsv filtered/barcodes.tsv | sort -k1,1rn > UMIperCell.txt

It outputs two columns:

column1 = total UMIs per cell 
column2 = 1 for cell that passed filtering, 0 otherwise

You can also run it with just the filtered matrix usage:

awk -f calcUMIperCell.awk filtereed/matrix.mtx | sort -k1,1rn > UMIperCell.txt

and it will output the counts just for the filtered cells (as described in the beginning of the post).

Cheers Alex

Lil-Psilocybe commented 8 months ago

Hi Alex,

Apologies on reviving an old thread, but I'm having trouble getting this script to return anything! I see the script is now named soloUMIperCell.awk in https://github.com/alexdobin/STAR/blob/master/extras/scripts/ and have used it both on my raw and filtered snRNAseq data only to get empty files.

Is the script still compatible with newest version of STARsolo?

For reference, my matrices look like so: johnbriseno@Johns-MacBook-Pro filtered % head matrix.mtx %%MatrixMarket matrix coordinate integer general % 39008 7369 4809708 36 1 1 38 1 1 41 1 1 75 1 2 94 1 1 132 1 1 148 1 1

And my barcodes look like this: johnbriseno@Johns-MacBook-Pro filtered % head barcodes.tsv AAACCCAAGGTTAAAC AAACCCAAGTAGTGCG AAACCCAAGTATGAGT AAACCCACAAGGGCAT AAACCCACAGTCCGTG AAACCCAGTCATCCCT AAACCCAGTTGCGGCT AAACCCATCTGTCTCG AAACGAAAGCAGGCTA AAACGAAAGCGGTAGT

From a scRNAseq workshop I took, I was able to generate a UMI vs Cell plot with cell density on y axis in log scale and UMIs on the x axis, but I'm specifically looking to generate a knee plot, something we didn't cover in the tutorial. Any help would be much appreciated and thanks once again for fantastic tools! -JB

alexdobin commented 7 months ago

Hi @Lil-Psilocybe

The script should work. You can also write your own: the script simply sums the counts in each cell from the matrix.mtx, and then adds the barcode sequences from barcodes.tsv.

Lil-Psilocybe commented 7 months ago

Hello!

Thanks for your reply, I'll give it a shot!

-JB