Closed Vishvak2000 closed 1 year ago
Hi,
What does TFseqs.fasta
look like? It should simply be a fasta file of the last two exons of every transcript.
Also, have you tried running alevin using a "standard" index, i.e. not one produced by LABRAT? I'm skeptical of this line in the log:
Found 32225250 reads with CB+UMI length smaller than expected.
This makes me suspicious that this is an alevin issue, not a LABRAT issue.
Here's the output of head TFseqs.fasta
>ENST00000641515 GTTTGTCAAAATGTGACTTGAATTAATAGATAAGGAGAGTCAGATGATAAGAGGGTCAAAATTATGTTTATCTTAGGAAAAGTAGAATAGAAAATTTATAAGCAGATTAAAAACACATAATAAAAGTAGTAAATAATAATGACAGTATCTCAAATCAGTGCAGGGGGGAAAGGCCTACTAATGTGATGGTGGGATAATTGGATAGCAATATGGGAAAAGATATATTTAATTTATTTGCTACACCAAATGCCAGGACAATCTCTAAGTGAATTCAAGACATAACTCTTTTTTCAAAAAAAC >ENST00000341065 AAATGTGACTTCAAAGGAAAGGAACAAATTTTCAAAGACTTGGGGGAGTGAAGGCAGAGCCTGGTGCAGATGGACGAGGTCTGCAGACGGAGGGCAGAGGTGGTGGAAGGGGCCAGGGGCCTGCAGGCCTCCCCCTGGAACTGGGACTGGTCTCGGTCTGCTGACGTCAGGGTCAGCTCCCCCGCGGAGCTGACTTCAGCAGCCCACAGCTGTGGGGCTTCAGCAGCCACACCAGCCCAGCCCAGCCCAGCTCTCGATACGTTTGGTCTTTCATGCTGAAAAATAAATAATAAAGCCTGT >ENST00000342066 AAATGTGACTTCAAAGGAAAGGAACAAATTTTCAAAGACTTGGGGGAGTGAAGGCAGAGCCTGGTGCAGATGGACGAGGTCTGCAGACGGAGGGCAGAGGTGGTGGAAGGGGCCAGGGGCCTGCAGGCCTCCCCCTGGAACTGGGACTGGTCTCGGTCTGCTGACGTCAGGGTCAGCTCCCCCGCGGAGCTGACTTCAGCAGCCCACAGCTGTGGGGCTTCAGCAGCCACACCAGCCCAGCCCAGCCCAGCTCTCGATACGTTTGGTCTTTCATGCTGAAAAATAAATAATAAAGCCTGT >ENST00000455979 TGCCACTGCAGCCACCAACCCTGCGGGCCCCGGAGCGAGAACTCGGCACAGGAGAGCAGCCCTTGTCCCCCACGACGGCCACGTCCCCCTATGGAGGGGGCCACGCCCTTGCCGGTCAAACTTCACCCAAGCAGGAGAATGGGACCTTGGCTCTACTTCCAGGGGCCCCCGACCCTTCCCAGCCTCTGTGTTGAGGTTGCCGGGGGTAGGGGTGGGGCCACACAAATCTCCAGGAGCCACCACTCAACACAATGGCCCTGCCTCCCACCGCTTTATTTCTTTCGGTTTCGGATGCAAAAC >ENST00000616016 GACTTCAAAGGAAAGGAACAAATTTTCAAAGACTTGGGGGAGTGAAGGCAGAGCCTGGTGCAGATGGACGAGGTCTGCAGACGGAGGGCAGAGGTGGTGGAAGGGGCCAGGGGCCTGCAGGCCTCCCCCTGGAACTGGGACTGGTCTCGGTCTGCTGACGTCAGGGTCAGCTCCCCCGCGGAGCTGACTTCAGCAGCCCACAGCTGTGGGGCTTCAGCAGCCACACCAGCCCAGCCCAGCCCAGCTCTCGATACGTTTGGTCTTTCATGCTGAAAAATAAATAATAAAGCCTGTCCCGTG
Yeah that seems fine. I would try aligning with a "normal" index. Whatever you would normally use. That line in the log makes me think there is something up with the fastq files.
Took a deeper look into my fastqs, seems like they were actually in v2 chemistry format (despite being specified as v3 chemistry on GEO) with R1 containing 26 nucleotides (v2) as opposed to 28 (v3). Running salmon with --chromium
instead of --chromiumV3
made it map correctly. Thanks for your help!
Hello,
I've been trying to run scLabrat on a single cell dataset as described in the README. First, I downloaded the provided references in the box folder and ran:
LABRAT.py --mode makeTFfasta --gff gencodecomprehensive.v28.gff3 --genomefasta GRCh38.primary_assembly.genome.fa --lasttwoexons --librarytype 3pseq
I then index TFseqs output using salmon index:
salmon index -t ../APA_ref/TFseqs.fasta -i ../APA_ref/txfasta.idx --type quasi -k 31 --keepDuplicates
I then try to run salmon alevin on my 10X chromium v3 single cell dataset:
salmon alevin -l ISR -1 P04_S1_L000_R1_001.fastq.gz -2 P04_S1_L000_R2_001.fastq.gz --chromiumV3 -i ../APA_ref/txfasta.idx/ -p 12 -o P04_output --tgMap ../APA_ref/txMAP.txt --fldMean 250 --fldSD 20 --validateMappings
However, I get 0% mapped reads (alevin logs attached) salmon_quant.log
I have tried this with other single cell datasets as well (using the same reference) and I get the same errors.
Am I not creating the APA reference correctly? Any help would be appreciated.
Thanks!