Closed Wednesdaysama closed 7 months ago
Hi!
Thank you for providing very detailed steps of what you tried!
Could you please include a few examples of the filenames (i.e. CDS files ending in .fna) in the folders sodalinema
and geitlerinemaceae_rest
? Do you also have even just one FASTA header from any of these files before running puppy-align
?
Thanks!
Hi Hans,
Thanks for your consideration.
They are named after GCA_020386575.1.fna, or GCA_004299065.1.fna, etc. And they are downloaded from NCBI (Genomic coding sequences). Every input .fna file contains multiple headers.
I've attached some of the files for you to look over.
Lianchun
Hi Lianchun!
Thank you again for your response and sorry for the confusion about the instructions.
It seems like the issue might have indeed occurred because of the filenames, which should contain the string "cds". For example, your filename should be called "GCA_020386575_cds.fna". You can find more details on naming requirements in the input section of the github documentation. Also, based on my personal experience, I like changing the GCA... names to something like "Genus_species_cds.fna" so that the downstream outputs are even easier to interpret, but this is totally up to you ;)
Please let me know if this fixes the issue or if we need to so more troubleshooting :)
Hans
Hi Hans, it worked! The filenames were the problems. Thank you so much !💯
Hi Tropinis,
I tested the data you provided with the command:
It worked well, creating the necessary files and directories (ResultDB.tsv, align_logfile.txt, mmseqs_tmp, and tmp) in the output directory.
But, when I tried running the command with different data:
I encountered an error where the "_cds" substring could not be found:
Even though the input files do contain the "_cds" substring in their headers. For example, one of the input fasta files looks like this:
If you have any suggestions or comments on how to resolve this issue, I would greatly appreciate it ;)
Thank you, Lianchun