Open rcosentino opened 3 years ago
Hi Raúl
I just pushed the script soloCountMatrixFromBAM.awk
into GitHub master branch.
samtools view Aligned.sortedByCoord.out.bam | awk -v fileWL=Solo.out/Gene/raw/barcodes.tsv -v fileGenes=Solo.out/Gene/raw/features.tsv -f /path/to/extras/scripts/soloCountMatrixFromBAM.awk | sort -k2,2n -k1,1n > mat.mtx
You need to add GX, CB and UB tags to the --outSAMattrbiutes
.
Is pretty slow and can use a lot of memory, so I recommend trying it out on a small run first, 100k-1M reads.
Cheers Alex
Hi Alex, I have been reading all of you comments about these sort of issues. I think I have a general idea but I am still confused. In my case, I cannot use the standard filtered count matrix from star solo because I want to pre-filter the bam file for reads that dont have the wasp tag vW and those passed vW==1 and also keep only autosomes. I localized the script soloCountMatrixFromBAM.awk
and I am also adding the tags GX, CB and UB tags to the --outSAMattrbiutes in the alignment. Having said that, my coding abilities are limited. How can I then get the barcodes.tsv and features.tsv from the filtered bam? I assume that after I create this raw barcodes.tsv , features.tsv and matrix.txt, I should then pass these thru the soloBasicCellFilter.awk
to get the filtered barcodes like the original star solo output?
Hi @matosmr
The barcodes.tsv is the full list of barcodes, and features.tsv is the full list of genes, so you can copy them from the STARsolo run - no need to get it from the BAM.
Hi Alex, thanks for the clarification. I am running into problems with the script soloCountMatrixFromBAM.awk.
I am getting the following errors
/gs/gsfs0/home/marlrodrig/aging_project/scRNAseq/scripts/AS_scrnaseq_preprocessing_v2/soloCountMatrixFromBAM.awk: line 6: syntax error near unexpected token tag' /gs/gsfs0/home/marlrodrig/aging_project/scRNAseq/scripts/AS_scrnaseq_preprocessing_v2/soloCountMatrixFromBAM.awk: line 6:
function getTag(tag)'
Hi Alex,
We would like to filter UMIs based on the amount of reads "supporting" them, is there any option to do it integrated into STARsolo? I could not find it. We were hoping to do it from the bam file, but until now we are not being able to re-create the raw matrix from the bam file. I read a previous question going in the same direction and you offer to share your script to re-create the matrix from the bam file, could you share it with us?
Thanks,
Raúl