Closed cpplyqz closed 11 months ago
Hello and thank you for your interest in using PhamClust!
I see two issues with your input files - the first is that your FASTA files appear to contain nucleotide sequences, whereas PhamClust requires that FASTA inputs be protein sequences (i.e., the genes encoded by your genomes). The second issue is that your FASTA headers are not structured in a way that PhamClust will be able to utilize, as the gene orthology information is not present. See the README.md for this repository for how the FASTA headers should be structured.
I'm assuming that RASTtk provides either a FASTA amino acid or GenBank flat file output for the annotations. If this is the case, I'll recommend you use PhaMMseqs to define gene phamilies. You should invoke PhaMMseqs like this:
phammseqs /path/to/genome/annotation/fasta/or/gbk/files -o /path/to/outdir -p
Including the -p
is important because it will create a file called strain_genes.tsv
which PhamClust uses as its preferred input format.
You might also consider using a different tool than RASTtk for annotating phages - it is intended for bacterial gene annotation and does an OK but not exceptional job of auto-annotating phages.
I'm happy to provide further assistance as needed; please let me know how this goes!
It worked well,Mr.Gauthier! I fellow your step to generate the strain_genes.tsv and I ran through the process very smoothly, thank you again for your prompt response and good luck with your research! here is part of my result : $ls phamcluster/phamclust_22_Nov_2023/ 40c31f606878e4f11f2ff0e12dfeaba5.tmp cluster_1 cluster_2 peq_heatmap.html phamclust.log singletons
It worked well,Mr.Gauthier! I fellow your step to generate the strain_genes.tsv and I ran through the process very smoothly, thank you again for your prompt response and good luck with your research! here is part of my result : $ls phamcluster/phamclust_22_Nov_2023/ 40c31f606878e4f11f2ff0e12dfeaba5.tmp cluster_1 cluster_2 peq_heatmap.html phamclust.log singletons
cppyqz | |
---|---|
@. | ---- Replied Message ---- | From | Christian @.> | | Date | 11/21/2023 20:25 | | To | @.> | | Cc | @.> , @.***> | | Subject | Re: [chg60/phamclust] How can I prepare the inputfile ? (Issue #2) |
Hello and thank you for your interest in using PhamClust!
I see two issues with your input files - the first is that your FASTA files appear to contain nucleotide sequences, whereas PhamClust requires that FASTA inputs be protein sequences (i.e., the genes encoded by your genomes). The second issue is that your FASTA headers are not structured in a way that PhamClust will be able to utilize, as the gene orthology information is not present. See the README.md for this repository for how the FASTA headers should be structured.
I'm assuming that RASTtk provides either a FASTA amino acid or GenBank flat file output for the annotations. If this is the case, I'll recommend you use PhaMMseqs to define gene phamilies. You should invoke PhaMMseqs like this:
phammseqs /path/to/genome/annotation/fasta/or/gbk/files -o /path/to/outdir -p
Including the -p is important because it will create a file called strain_genes.tsv which PhamClust uses as its preferred input format.
I'm happy to provide further assistance as needed; please let me know how this goes!
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
I'm delighted to hear that you were able to get it working successfully, and I hope you find the output from PhamClust useful to your research!
Now,I have some sequence that annotated by rasttk,I want to use this software to cluster my phages ,but I don't know how to prepare the inputfile,my dir is like this : $ll ./testinputdit P1.fasta P2.fasta $less P1.fasta