Closed kh49 closed 6 years ago
Hi @pophipi ,
Sorry for the late response.
We have updated some of the steps for the v1
pipeline and the relevant How to document has been updated too. Please refer to the tutorial and let us know if you still face the problem.
Specifically in your case you might have to do the following things:
wrapper.cpp
file in salmon/scripts/v1_10x
folder with the command:
g++ -std=c++11 -O3 -I \<PATH TO SALMON INCLUDE DIRECTORY\>-o wrapper wrapper.cpp -lz
run.sh
file here with the path to the wrapper
binary created in the above step../run.sh salmon alevin -l ISR -b ./reads/ --gemcode -i ./index_15_pc -p 10 -o ./alevin_15_pc --tgMap ./txp2gene.tsv --dumpCsvCounts --dumpFeatures --end 5 --umiLength 5 --barcodeLength 14
@k3yavi That seems to have fixed it, but I did get a warning at the end:
[2018-09-18 15:18:46.675] [alevinLog] [info] Finished optimizer
[2018-09-18 15:18:46.697] [jointLog] [warning] NOTE: Read Lib [./read-I1_si-AGGGACTG_lane-001-chunk-001.fastq.gz, ./read-I1_si-AGGGACTG_lane-002-chunk-000.fastq.gz, ./read-I1_si-AGGGACTG_lane-003-chunk-003.fastq.gz, ./read-I1_si-AGGGACTG_lane-004-chunk-002.fastq.gz, ./read-I1_si-CCTCTAAC_lane-001-chunk-001.fastq.gz, ./read-I1_si-CCTCTAAC_lane-002-chunk-000.fastq.gz, ./read-I1_si-CCTCTAAC_lane-003-chunk-003.fastq.gz, ./read-I1_si-CCTCTAAC_lane-004-chunk-002.fastq.gz, ./read-I1_si-GACAGGCT_lane-001-chunk-001.fastq.gz, ./read-I1_si-GACAGGCT_lane-002-chunk-000.fastq.gz, ./read-I1_si-GACAGGCT_lane-003-chunk-003.fastq.gz, ./read-I1_si-GACAGGCT_lane-004-chunk-002.fastq.gz, ./read-I1_si-TTATCTGA_lane-001-chunk-001.fastq.gz, ./read-I1_si-TTATCTGA_lane-002-chunk-000.fastq.gz, ./read-I1_si-TTATCTGA_lane-003-chunk-003.fastq.gz, ./read-I1_si-TTATCTGA_lane-004-chunk-002.fastq.gz] :
Greater than 5% of the fragments disagreed with the provided library type; check the file: ../../alevin_15_pc/lib_format_counts.json for details
Is this ok to ignore?
can you share the contents of the file ../../alevin_15_pc/lib_format_counts.json
?
Basically what it's saying is that the assumption made to explicitly define the library type in the command line flag i.e. -lISR
which means that the library is stranded and the reads are coming from the reverse strand is getting violated. In 10x protocols we generally expects that the read follow the ISR
standard but it looks like some 5% of the reads are not following this property and that's what Alevin is complaining. It is possible since this is v1, we might expect some non-trivial fraction of reads to be non-stranded but too hard to say .
It looks like the combined read files all got classified under "SF". I inspected the fastqs and they seem to be formatted as expected: read-RA files contain the 98bp transcript and 5bp umi reads, read-I1 contains the 14bp I7 barcode.
{
"read_files": "./read-I1_si-ACTTCACT_lane-001-chunk-001.fastq.gz, ./read-I1_si-ACTTCACT_lane-002-chunk-000.fastq.gz, ./read-I1_si-ACTTCACT_lane-003-chunk-003.fastq.gz, ./read-I1_si-ACTTCACT_lane-004-chunk-002.fastq.gz, ./read-I1_si-CGAAGTTG_lane-001-chunk-001.fastq.gz, ./read-I1_si-CGAAGTTG_lane-002-chunk-000.fastq.gz, ./read-I1_si-CGAAGTTG_lane-003-chunk-003.fastq.gz, ./read-I1_si-CGAAGTTG_lane-004-chunk-002.fastq.gz, ./read-I1_si-GAGCACGC_lane-001-chunk-001.fastq.gz, ./read-I1_si-GAGCACGC_lane-002-chunk-000.fastq.gz, ./read-I1_si-GAGCACGC_lane-003-chunk-003.fastq.gz, ./read-I1_si-GAGCACGC_lane-004-chunk-002.fastq.gz, ./read-I1_si-TTCGTGAA_lane-001-chunk-001.fastq.gz, ./read-I1_si-TTCGTGAA_lane-002-chunk-000.fastq.gz, ./read-I1_si-TTCGTGAA_lane-003-chunk-003.fastq.gz, ./read-I1_si-TTCGTGAA_lane-004-chunk-002.fastq.gz",
"expected_format": "U",
"compatible_fragment_ratio": 0.0,
"num_compatible_fragments": 0,
"num_assigned_fragments": 162343601,
"num_frags_with_consistent_mappings": 0,
"num_frags_with_inconsistent_or_orphan_mappings": 0,
"strand_mapping_bias": NaN,
"MSF": 0,
"OSF": 0,
"ISF": 0,
"MSR": 0,
"OSR": 0,
"ISR": 0,
"SF": 0,
"SR": 0,
"MU": 0,
"OU": 0,
"IU": 0,
"U": 0,
"read_files": "( /tmp/tmp.EWJ7aRZf0W/p1.fa, /tmp/tmp.EWJ7aRZf0W/p2.fa )",
"expected_format": "ISR",
"compatible_fragment_ratio": 1.0,
"num_compatible_fragments": 162343601,
"num_assigned_fragments": 162343601,
"num_frags_with_consistent_mappings": 0,
"num_frags_with_inconsistent_or_orphan_mappings": 592460922,
"MSF": 0,
"OSF": 0,
"ISF": 0,
"MSR": 0,
"OSR": 0,
"ISR": 0,
"SF": 592460922,
"SR": 0,
"MU": 0,
"OU": 0,
"IU": 0,
"U": 0
}
read-RA sample:
@NB500915:115:HVHT5BGXX:1:11101:22700:1088 1:N:0:0
CTTCATGCCCTGGGTTCTGCCCGCACGGACCCCCATCTCTGTGACTTCCTGGAGACTCACTTCCTAGATGAGGAAGTGAAGCTTATCAAGAAGATGGG
+
A/AAAEEEEEE/EEE/EEEEE/EEAAE/EEE<EAAEE/EAEAEEEAE6E//EE///<6E//<EAEA/EEEEEEEAEA</EAA6EEEEEEE<EE/<//6
@NB500915:115:HVHT5BGXX:1:11101:22700:1088 4:N:0:0
CTGCG
+
AA/A6
@NB500915:115:HVHT5BGXX:1:11101:24667:1088 1:N:0:0
ATTGTCCGCCTGGATTCCCAGAAGCACATCGACTTCTCTCTGCGCTCTCCCTACGGGGGTGGCCGCCCGGGCCGCGTGAAGAGGAAGAATGCCAAGAA
read-I1 sample:
@NB500915:115:HVHT5BGXX:1:11101:22700:1088 2:N:0:0
TTATTCCTTGCCTC
+
A/AAAAEAAEEAEE
@NB500915:115:HVHT5BGXX:1:11101:24667:1088 2:N:0:0
GCTCACTGTAGTCG
+
AAAAAEEEEEEEEE
@NB500915:115:HVHT5BGXX:1:11101:10530:1088 2:N:0:0
TTGTCATGGGTGAG
Hi @pophipi , The numbers looks good, it was my mistake the libtypes in Alevin are tangled because of Single-end reads and naming clash with salmon. We will work on this sometime soon but as per the issue for running Alevin with v1 it looks good. I am closing this issue for now feel free to open it if you have any other questions .
Alevin in Salmon v0.11.2
Describe the bug When attempting to run Alevin on https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.1.0/cd14_monocytes, with the 10x v1 wrapper, the initialization seems to go smoothly then Alevin produces the following:
To Reproduce Steps and data to reproduce the behavior:
Specifically, please provide at least the following information: A downloaded binary Salmon v0.11.2 was executed using the v1 wrapper script compiled locally in a Salmon specific Conda environment. The GRCh38.p12 reference was used. Dataset is linked above in bug description.
Full command used:
~/bin/salmon/scripts/v1_10x/run.sh salmon alevin -l ISR -1 read-I1_*.fastq.gz -2 read-RA_*.fastq.gz -r read-I1_*.fastq.gz --gemcode -i ../../../index_15_pc -p 10 -o ../../alevin_15_pc --tgMap ../../../txp2gene.tsv --dumpCsvCounts --dumpFeatures --end 5 --umiLength 5 --barcodeLength 14
Full terminal output:Desktop (please complete the following information):