FemeniasM / ExplorATE_shell_script

6 stars 3 forks source link

error: The fixFasta phase failed with exit code 1 #5

Open martaluc opened 7 months ago

martaluc commented 7 months ago

Hi, I am using your tool in the bash version with the model organism procedure. The model organism is mus musculus and I downloaded the data from here: https://hgdownload.soe.ucsc.edu/goldenPath/mm39/bigZips/ (mm39.fa.out and mm39.fa) and https://hgdownload.soe.ucsc.edu/goldenPath/mm39/bigZips/genes/ (refGene.gtf). To run the code I used the following command:

bash ~/ExplorATE_shell_script/ExplorATE mo -p20 -f mm39.fa -g refGene.gtf -r mm39.fa.out -e se -l ~/data/gene-expression/merged_fastq/ -o output/

and I got the following error: [2024-02-07 14:03:25.775] [puff::index::jointLog] [critical] The decoy file contained the names of 2162318 decoy sequences, but 2162315 were matched by sequences in the reference file provided. To prevent unintentional errors downstream, please ensure that the decoy file exactly matches with the fasta file that is being indexed. [2024-02-07 14:03:27.960] [puff::index::jointLog] [error] The fixFasta phase failed with exit code 1

Do you have any idea why this occurs? Does the problem is related to my genome fasta file or fastq files? Thank you, Marta

NOTE: I used salmon 1.10.2 and bedtools v2.31.1

yhao123abc commented 7 months ago

Dear @FemeniasM,

Yes, I got exactly the same error message as above using the same mm39 files as Marta.

My code (for paired-end reads): bash ~/ExplorATE_shell_script/ExplorATE mo -p12 -f mm39.fa -g refGene.gtf -r mm39.fa.out -e pe -l reads -o out_mm39/

I also tried mm10 using the files: https://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/ (mm10.fa.out and mm10.fa), https://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/genes/ (mm10.refGene.gtf).

bash ~/ExplorATE_shell_script/ExplorATE mo -p12 -f mm10.fa -g mm10.refGene.gtf -r mm10.fa.out -e pe -l reads -o out_mm10/

I got the similar error: [2024-02-018 14:10:40.097] [puff::index::jointLog] [critical] The decoy file contained the names of 2156825 decoy sequences, but 2156821 were matched by sequences in the reference file provided. To prevent unintentional errors downstream, please ensure that the decoy file exactly matches with the fasta file that is being indexed. [2024-02-018 14:10:40.097] [puff::index::jointLog] [error] The fixFasta phase failed with exit code 1

Please help to solve this problem. Thank a lot!

Best, Yi

yhao123abc commented 6 months ago

Hi,

Anyone, please help with some solutions?

Hi @martaluc, have you figured it out?

It stopped when indexing in salmon.

Screenshot 2024-02-28 at 5 43 29 PM

It stopped at line 332 in the script /FemeniasM/ExplorATE_shell_script/bin/ExplorATE_mo.sh line 332: $salmon_path index -p $threads -t trmeSalmon.fa -k $kmer -i SALMON_INDEX --decoys decoys.txt

So it hasn't used our fastq sequencing data yet.

But how to solve this problem? What is wrong?

Thanks, Yi