Open Rohit-Satyam opened 3 years ago
Also in the lib_format_counts.json
, the reads doesn't seems to belong to any library type!!
{
"read_files": "[ /nfs_master/anshul/KAUST_Plasmo_scRNAseq/M-19-3774_9-NT_SI-GA-F3_S1_L003_R1_001.fastq.gz, /nfs_master/anshul/KAUST_Plasmo_scRNAseq/M-19-3774_9-NT_SI-GA-F3_S1_L003_R2_001.fastq.gz]",
"expected_format": "ISR",
"compatible_fragment_ratio": 1.0,
"num_compatible_fragments": 18981014,
"num_assigned_fragments": 18981014,
"num_frags_with_concordant_consistent_mappings": 0,
"num_frags_with_inconsistent_or_orphan_mappings": 29136874,
"strand_mapping_bias": 0.0,
"MSF": 0,
"OSF": 0,
"ISF": 0,
"MSR": 0,
"OSR": 0,
"ISR": 0,
"SF": 0,
"SR": 0,
"MU": 0,
"OU": 0,
"IU": 0,
"U": 0
}
Dear Rohit, I am facing the same issue with a non-model organism, I was wondering if you are manage to improve the mapping rate with different arguments. If yes, could you please let us know the solution.
Rahul
Hi @rahulnutron,
Currently, the recommendation is to move to alevin-fry
. If you don't have a good annotation (i.e. transcripts as well as introns) then you can index just the transcriptome as normal. The big differences are that, rather than letting alevin do the quantification, you would ask salmon alevin
to just a do the mapping phase and produce a rad file. This can be done with (assuming chromium v3 chemistry):
salmon alevin -i <index> --chromiumV3 -l A -1 <reads1> -2 <reads2> -p 16 -o <outdir> --sketch
note that with the alevin-fry
pipeline, there is no need to provide the t2g map at this phase (it's used later). Further, the --sketch
flag tells alevin just to map the reads and to prepare the RAD file for subsequent quantification with alevin-fry
.
Hi developers!!
Background
I am using Alevin to align the P.Falciparum single cell dataset. For this sample we got a total of 27% of reads aligning using CellRanger and I am expecting this number to increase a bit.
Details of sequencing are
Sequencing platform used: HiSeq4000 Sequencing Chemistry: V3 Chemistry (Chromium Single Cell 3' v3, 10x genomics)
raw-data
S1_L003_I1_001.fastq.gz S1_L003_R1_001.fastq.gz S1_L003_R2_001.fastq.gz
Preprocessing
I used the following steps by following this tutorial: I use AGAT because the code shared for GRCh38 didn't work well for my organism.
Then I ran Alevin in three times (each time adding flags as discussed below) to see if mapping percentage increases or not:
Then I tried setting
--keepCBFraction 1
. This does decrease the total number of reads being thrown away. However, mapping percentage is still low as compared to what I was getting from CellRanger (27%). I thought that since Alevin takes into consideration the multi mapping reads, the mapping percentage will likely increase because when we ran STAR on this data we found a lot of multi mapping reads.I also tried setting the
--expectCells 2000
to see if this increases the mapping percentage but it doesn't significantly increase mapping percentage.Follow-up question
Q1. I observe that
"seq_bias_correct": false, "gc_bias_correct": false,
are set to false. How do I turn them on ?? does the Alevin accepts--seqBias
and--gcBias
flags because they weren't mentioned in the comments.Q2. How do I make use of
I1.fastq.gz
file. I for sure believe that I have to combine it somehow withR1.fastq.gz
but please help me understand how to do that if that's so.