Closed robertzeibich closed 2 weeks ago
Can you please say how you are running simpleaf (the invocation) and also what versions of simpleaf, piscem and alevin-fry you are using? R1 is usually the technical read and only needs to be long enough to hold the barcode and UMI. Most commonly, it’s not even attempted to align it. Regardless, in the worst case reads shorter than the kmer length should be discarded ( not cause termination ). You shouldn’t have to change the index.
—Rob
I highly appreciate your quick response. I am following the simpleaf online documentation.
Created a conda enviroment.
alevin-fry 0.10.0 h919a2d8_0 bioconda piscem 0.10.3 h7c313c4_0 bioconda salmon 1.10.3 h6dccd9a_2 bioconda simpleaf 0.17.2 h919a2d8_0 bioconda
rlen=91
IDX_DIR="${REF_DIR}/human-2020-A_splici/${rlen}"
simpleaf index \
--output $IDX_DIR \
--fasta $REF_DIR/fasta/genome.fa \
--gtf $REF_DIR/genes/genes.gtf \
--rlen ${rlen} \
--threads $(nproc) \
--use-piscem
simpleaf quant \
--reads1 ${workdir}/${sra_id}_1.fastq.gz \
--reads2 ${workdir}/${sra_id}_2.fastq.gz \
--threads $(nproc) \
--index $IDX_DIR/index \
--chemistry 10xv3 --resolution cr-like \
--expected-ori fw --unfiltered-pl \
--t2g-map $IDX_DIR/index/t2g_3col.tsv \
--output ${workdir}/alevin/${sra_id}
Hrmm, certainly it should not be the case that read 1 is being mapped in 10xv3 chemistry. You more that this happens when read 1 is < 31 nucleotides. Do you have data with longer R1 where it runs to completion?
If it is possible to share your index somehow, and a small sample of reads that reproduce the problem (a few hundred or few thousand should suffice), we’d be happy to take a look.
Sure. I am happy to. Where should I send the files to?
Hi @robertzeibich,
The read pairs (when gzipped) should be small enough to upload directly here. If you send me an e-mail (rob@cs.umd.edu), I can share a box
folder with you where you can upload the index.
--Rob
I selected the first 5,000 rows:
SRR12905322_1.fastq_sub.gz SRR12905322_2.fastq_sub.gz
I hope that this is enough. I will upload the index in a second.
Based on your question: Do you have data with longer R1 where it runs to completion? Yes, I have longer R1 where it runs to completion.
Because of this post, I thought that the problem might be k-mer related.
However, some other runs ran successfully with a low R1 read length.
This issue can be closed due to @rob-p support.
If you should run into the same error (see first message), make sure that the FASTQ files were downloaded correctly.
Re different read lenght:
"R1 is usually the technical read and only needs to be long enough to hold the barcode and UMI" R2 contains the sequence. From a given sequence the length can be extracted.
As I stated in my first message, I got read lengths of
R1 27; R2 125 R1 29; R2 99 R1 29; R2 92
At the end, I used the read lengths 98 and 91 to build the indices. However, I also tried 99 and 92 and the number of cells and genes were exactly the same.
R1 27; R2 125 R1 29; R2 99 R1 29; R2 92
Alignment crashes when the read length is less than 31.
Should I create an index with
simpleaf index -k 26
?