Open neptuneyt opened 5 months ago
Dear @neptuneyt ,
The prokayotic constraint of the program only applies to the extract
module which uses prodigal
.
Any other functionalities, including AAI calculations, should work with eukaryotic proteomes.
Just use convert
module to your FASTA file with eukaryotic proteins to produce a database compatible with any subsequent processes.
Hope this helps!
Thank you for such a prompt reply, I'II try it.
Following your instructions very fortunately I got the results, but I encountered another problem, i.e. how to get non-redundant AAI between N proteomes, e.g. there are three proteomes A.faa, B.faa, and C.faa, and in the end only AB, AC, and BC should be computed, but using the following commands I ended up with 3*3=9 pairs, and there are 6 redundant results (AB=BA)
$ ls protein_db
A.faa.db B.faa.db C.faa.db
$ EzAAI calculate -i protein_db -j protein_db -t 10 -o ezaai_q3_r3.tsv
``Following your instructions very fortunately I got the results, but I encountered another problem, i.e. how to get non-redundant AAI between n proteomes, e.g. there are three proteomes A.faa, B.faa, and C.faa, and in the end only AB, AC, and BC should be computed, but using the following commands I ended up with 3*3=9 pairs, and there are 6 redundant results (AB=BA)
```bash
$ ls protein_db
A.faa.db B.faa.db C.faa.db
$ EzAAI calculate -i protein_db -j protein_db -t 10 -o ezaai_q3_r3.tsv
``Following your instructions very fortunately I got the results, but I encountered another problem, i.e. how to get non-redundant AAI between n proteomes, e.g. there are three proteomes A.faa, B.faa, and C.faa, and in the end only AB, AC, and BC should be computed, but using the following commands I ended up with 3*3=9 pairs, and there are 6 redundant results (AB=BA)
```bash
$ ls protein_db
A.faa.db B.faa.db C.faa.db
$ EzAAI calculate -i protein_db -j protein_db -t 10 -o ezaai_q3_r3.tsv
If the N were smaller it wouldn't consume much time, but I'm afraid there are thousands of them, so this comparison will be quite time consuming. Looking forward to your reply if there is a good solution!
Thank you for pointing this out.
This is actually a result of my lazy implementation. Current code only implements comparison between two distinct set of proteomes, therefore, has no ability to detect redundancy even if two identical sets are given as an input.
I assume I can provide something like -self
flag that exclusively indicates that this comparison is against itself.
Dear developer,
I'm curious if ezaai is also suitable for AAI calculations between eukaryotic proteins, please note that proteins and not genomes are entered here.
Looking forward your reply. Thanks a lot.