Open GiwaAO opened 2 years ago
@GiwaAO
Did you concatenate the transcriptome file and genome file (it has to be in this order) to create the gentrome file before salmon index
Along with the list of decoys salmon
also needs the concatenated transcriptome
and genome reference file for index.
NOTE: the genome targets (decoys) should come
after the transcriptome targets in the reference
cat gencode.vM23.transcripts.fa.gz GRCm38.primary_assembly.genome.fa.gz > gentrome.fa.gz
I'm having a similar issue and have concatenated the transcriptome file and genome file. I also tried following the tutorial here (https://combine-lab.github.io/alevin-tutorial/2020/alevin-velocity/), but same issue as GiwaAO.
Hi @jwg054000,
For single-cell processing, you should ideally move to alevin-fry. You can find a velocity tutorial for alevin-fry here.
Best, Rob
I'm getting the same error as reported above. Copying the code i ran below:
# download reference genome
curl -JLO https://ftp.ncbi.nlm.nih.gov/genomes/refseq/vertebrate_mammalian/Homo_sapiens/latest_assembly_versions/GCF_009914755.1_T2T-CHM13v2.0/GCF_009914755.1_T2T-CHM13v2.0_genomic.fna.gz
# extract chromosome names
grep "^>" <(gunzip -c GCF_009914755.1_T2T-CHM13v2.0_genomic.fna.gz) | cut -d " " -f 1 > GCF_009914755.1_T2T-CHM13v2.0_genomic.txt
# download transcriptome
curl -JLO https://ftp.ncbi.nlm.nih.gov/genomes/refseq/vertebrate_mammalian/Homo_sapiens/latest_assembly_versions/GCF_000001405.40_GRCh38.p14/GCF_000001405.40_GRCh38.p14_rna.fna.gz
# combine transcriptome and genome, in that order
cat GCF_000001405.40_GRCh38.p14_rna.fna.gz GCF_009914755.1_T2T-CHM13v2.0_genomic.fna.gz > human_seq.fa.gz
# give to salmon to index
salmon index -t human_seq.fa.gz -i salmon_index -d GCF_009914755.1_T2T-CHM13v2.0_genomic.txt
There must be a newline ("\n") at the end of the first file otherwise when the two are concatenated then you'll get a messed up fasta sequence at the seams.
Hello,
I tried creating a salmon index for bos taurus but was not successful. I created the decoy file using:
grep "^>" <(gunzip -c Bos_taurus.ARS-UCD1.2.dna.toplevel.fa.gz) | cut -d " " -f 1 > decoys.txt sed -i.bak -e 's/>//g' decoys.txt
When i try to index using salmon index -t bos_taurus_gentrome.fa.gz -d decoys.txt -p 12 -i salmon_index --gencode OR salmon index -t Bos_taurus.ARS-UCD1.2.cdna.all.fa.gz -i bos_taurus_107_index --decoys decoys.txt -k 31
I get an error. The last two lines of the log file are
[puff::index::jointLog] [critical] The decoy file contained the names of 2211 decoy sequences, but 0 were matched by sequences in the reference file provided. To prevent unintentional errors downstream, please ensure that the decoy file exactly matches with the fasta file that is being indexed. [puff::index::jointLog] [error] The fixFasta phase failed with exit code 1
What is happening and how can i solve this issue?