GiantSpaceRobot / FindFungi

A pipeline for the identification of fungi in public metagenomics datasets
16 stars 15 forks source link

FASTA fails to be processed #8

Closed wolfgangrumpf closed 4 years ago

wolfgangrumpf commented 5 years ago

I found a FASTA that doesn't seem to play well with FindFungi. The directory structure is created, but Kraken seems to fail for some reason with this error:

Loading database... complete. classify: malformed fasta file - expected header char > not found 0 sequences (0.00 Mbp) processed in 0.008s (0.0 Kseq/m, 0.00 Mbp/m). 0 sequences classified (-nan%) 0 sequences unclassified (-nan%)

The input file does have the > character - here's some sample sequences:

> NODE_24005_length_1000_cov_2.279365
TCGTCGGCTGACACAAGGCGGGCGCTGCCAACTTTCTCGGTTCTCAGAATGCCGTCGCGA
ATCATGGCGTGTAGGCGCGAGGTGCTGACATCGAGGATATCCGCCGCTGCTTGGACGGTC
ATGGTGGTGAGGGAAGGCGTGTCGGCATCGCAGTCCACGGCAACGGCGATGGCCTTGCCG
TTTGCGGGGGCGGGATGGTCCATCCGCGGCTCCGGCAAGGGCCGTCCTTCGCGAAGCGCT
TGGGAGATCCAGAGGGTCAGCAGATCTTGCGCCATGAAAGCCGCATCGAACAGAGTGTCC
CCCTGGGTGCAGATACCCAGGTCGGGAAACTCAGCTTCCCACCCGCCGCTCCACGGGGTG
AGGATGGCTTCGTAGAGAAACTTCATAGGTCGCTTCCTCTCTATTGGGTGAAGTATTAGA
GAAGCCCTAGCTTTTCTTTGATGGCTCGCTCGACCCCTGGCGAAAGATCTCCGGGATGCC
GGGGGATAGGGAACTCTTCGCCAGCCGCGTTCGCGAAAATGTCGTGTCTCGCTCCGTGCC
TGACGAAGCGTCCGCCCATTTTCTTGGTAAGTCGTATCGCTTCTCTAGCTGTCATGGGCA
TGTCTCTCGACCTCCTGCATGGCACAATTTATTGTAACAATTAGTAACAATAAATGCAAG
CACTGCCGTCTTCTCAAAGCGAATTATAGCAAACAAATGTTCTTAGAACAATAGTTCATT
TCAGGATAATCGAGGGGGTCTCGGCAGAAGGCTCTCGGATTTTCCGATGGCGTTTGCTGG
GGACTTCCTATAACGGGGGTGCTCCGGGTGCTTCTATAATAGGCGAGCGAAAAAATGCGA
GACCAAAAGGAGCCCTGAATGAGCGAGCGGATTGGAACGACCTGCGTGCAAGGCGGGTGG
CGGCCCGGCGACGGCGAGCCGCGCCAAGTGCCCATCTACCAGAACACCACCTGGAAGTAC
GACACGAGCGAGCATATGGGGCGCCTGTTCGATCTGGAGG

> NODE_24006_length_1000_cov_2.211640
GTGCGAGTGTATTTTTACATCAGCTTACCGGTGAAGAGAAATATGTAAAAAATGCGGTTC
TGGCAGCGGATTATACAATGGAATACCTGTATAAAAACGGCATCATGAACAACGAAGCGG
ATGGAGACGACATGCCAGGATTTAAGGGGATTCTGGCAAGATGGCTCAGCAAGCTCGTTT
ATGAAGAGAACCAGACCAAATATTTTGCATGGATGGAGAAAAATGCGGACAGCGCATGGC
TGCACCGTAACACGCAGAACCTGATGTGGACGGCATGGGAGTTCCCGACCAATGAGTTCC
CGCGCTGCGCATGGGGCTGCAGCGCGGCGGTAGCACAGCAGTTTGCGTGTCTGCCGTACA
AAAAATAAACAATAGAATGCGCGTTCTGCGAACATTTTGTTGTCACGAACAAAAAGGGGA
AATCATATGAAACGGGTACACTTAATTTGCAACGCACACCTCGACCCTGTATGGCTCTGG
CGCTGGCAGGAAGGCTGCACGGAAGCGCTTTCCACATTCCGCACGGCAGAAGCCTTTACG
GATGAATTTCCGGGCTTTGTGTTCAACCACAACGAAGCGATCCTCTATGAATGGGTCAAG
GAAAATGAGCCGGAGCTGTTTGCTCGCATTCAGCAGAAGGTCAAAGAGGGCAAATGGCAT
ATCATGGGCGGCTGGTATTTGCAGCCCGACTGCAACATGCCAAACGGCGAATCCATTATC
CGAAACATTTCGGAAGGACACCGGTTCTTCGAAGAAGAATTCGGCGTGCGCCCGACAACG
GCGATTAACTTTGACTCCTTTGGTCATTCTGTAGGTCTGGTTCAGATCTTAAATCAGGCT
GGCTATGATACTTATGTGGTATGCCGTCCGGCAAAGGCGCAGTTCCCGTTTGAGGAACAG
GATTATCTGTGGAAAGGTCTTGCCGGTTCGGAGGTTCTGGTGCATCGTTCCGATGAAAAC
TATAACTCCGTTTACGGGCATGTCGGAAAAGAACTGGAAC

Any ideas why this would fail?

GiantSpaceRobot commented 5 years ago

Hi there,

Can you try removing the space between the '>' char and the 'NODE' names and let me know if that works?

Thanks, Paul

tsoratto commented 4 years ago

Hi, I'm getting the same error and my fasta file doesn't have a space in the name. image Any ideas why this would fail?

GiantSpaceRobot commented 4 years ago

Hi, can you send me an example FASTA file that causes the issue please? Thanks

tsoratto commented 4 years ago

I'm trying with this file:

output_contigs.txt

and getting this error:

classify: malformed fasta file - expected header char > not found 0 sequences (0.00 Mbp) processed in 0.006s (0.0 Kseq/m, 0.00 Mbp/m). 0 sequences classified (-nan%) 0 sequences unclassified (-nan%)

thanks

GiantSpaceRobot commented 4 years ago

Hi there,

I can't see anything wrong with that FASTA file. I'm afraid I do not understand why you are experiencing errors. What is the full command you are using?

tsoratto commented 4 years ago

I'm using this comand: ./FindFungi-0.23.3.sh output_contigs.txt P752101

GiantSpaceRobot commented 4 years ago

Ah, I see. The pipeline is designed for FASTQ files. You can convert your FASTA to FASTQ with tools such as this one: https://code.google.com/archive/p/fasta-to-fastq/. (NOTE: the additional information used to buff out the FASTA to FASTQ conversion will not mean anything)