Closed kh49 closed 6 years ago
Hi @pophipi ,
Thanks for reporting the issue.
It looks like a bug has creeped in while merging the drop-seq
pipeline to alevin here.
If it's an urgent requirement you can swap the above two lines by:
umi = read.substr(pt.barcodeLength, pt.umiLength);
return true;
If it's too much trouble to compile from source, we will release a version w/ the hot-fix by today/tomorrow and would update you soon. Thanks again !
P.S: I was curious how did the chromium
pipeline went through, since the length requirement of 10x based pipeline is longer and it should break much earlier. The experiment you forwarded above seems to have 25 length bases for CB+UMI sequences. I wonder has any of the Drop-seq guideline changed? I was in the impression it was 12 base CB and 8 base UMI if not, then --dropseq
flag would not be ideal thing to use since it will just use 20 bases out of 25 present in the fastq files.
hi @k3yavi,
Thanks for your help! I'm glad it's a quick fix. As for the dataset, I am not sure why the read length is 25bp. The paper I pulled it from stated that they used the standard DropSeq protocol and did not seem to mention and changes in CB and UMI length. In the case that they did change those lengths, what options can I use to set the pipeline?
We might have to go through the paper and the dropseq guidelines to check what really changed. You might wanna check https://github.com/COMBINE-lab/salmon/issues/247, we actually have a hidden option to do customized umi/CB length options, however this goes into a little more unexplored territory and requires a bit more testing. We'd appreciate your feedback if you happen to run this mode.
hey @pophipi , we have released v0.11.1
with the fix.
Thanks for reporting this issue.
Also, I tried running the dropseq
mode for Alevin w/ the data you forwarded but it looks like the mapping rate is too low. I am not sure how to interpret the data, but just for sanity Alevin
mapping rate is ~70% in the original Macosko et. al. paper. We will keep looking for the updates in DropSeq pipeline feel free to reopen this issue or create a new one regarding the low mapping rate if you find out the right location of UMI and CB in the dataset or trouble using #247 .
@k3yavi To follow up on this dataset: The reads were generated using a modified protocol with a 9bp barcode followed by an 8bp UMI. I used the custom length mode to align this data and alignment rate went up to about 45%.
I tried the alignment using DropSeq Tools and STAR and got similar alignment rates, so I think the custom length alignment is working properly. I may try using some other reference databases instead of GRCh38.p12 to see if alignment improves. Otherwise it may just be an issue regarding the dataset itself.
Glad to hear that, let us know if you need any other help or have suggestions / feedbacks to improve Alevin.
Hi,
I'm having issues getting alevin to work on dropseq data after following the tutorial for setting it up.
I am using the following command to run it:
salmon alevin -l ISR -1 SRR6054189.sra_1.fastq -2 SRR6054189.sra_2.fastq --dropseq -i ~/Documents/CordBlood/data/index_15 -p 10 -o ~/Documents/CordBlood/data/alevin_out --tgMap ~/Documents/CordBlood/data/txp2gene.tsv --dumpCsvCounts
and eventually get "Incorrect call for umi extract"
Here's the full output:
I traced it back to AlevinUtils.cpp in the source files but could not make sense of it from there.
The program will run completely on the same data and library if I change --dropseq to --Chromium, eventually outputting the following after processing the reads:
and then this after processing the cells:
Other info: Salmon v0.11.0 - downloaded binary from Github I used Gencode 28 for the transcriptome read files: https://www.ncbi.nlm.nih.gov/sra/SRX2676721[accn]
OS: CentOS version: 2.6.32-696.23.1.el6.centos.plus.x86_64 LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch Distributor ID: CentOS Description: CentOS release 6.9 (Final) Release: 6.9 Codename: Final