Open skose82 opened 1 week ago
That's because it's a FASTQ file. I should add a message explicitly stating that those are not supported.
Can you convert it to FASTA with seqkit fq2fa
? Also, are those short or long reads? geNomad is not designed to work with short sequences such as 150bp reads.
Thank you for the quick response. They are short reads and worked with the nf-core/mag https://nf-co.re/mag/3.0.0 workflow for metagenomes so I was trying to use it as a standalone package. Mostly because they were assembled genomes I believe. So pair-end short reads are a no go with genomad? Do you know of any packages that can do similar with pair-end data?
Thanks
If you ran this pipeline, you should have an assembly (or multiple) for your data. The workflow assembles metagenomes prior to binning.
Regarding doing the analysis directly on reads, it depends on what you want. If you want to evaluate presence of known viruses, you could run something like PHANTA or KMCP. For discovery is new viruses or description of virus genomes, this won't do it, you will need assemblies first.
Hi I am getting the following error:
sample1.fastq.gz is either empty or contains multiple entries with the same identifier. Please check your input FASTA file and execute genomad annotate again.
Not sure what's happening as the file isn't empty and works fine with other scripts/programs.
Is there a way to run genomad with pair-end fastq files or do hey need to be interleavened first?
Command is: genomad end-to-end --cleanup --splits 20 sample1_R1.fastq.gz genomad_output genomad_db
Thanks