GDKO / AvP

Automatic evaluation of HGTs
GNU General Public License v3.0
18 stars 2 forks source link

HGT candidates in M.incognita #6

Closed csxie-666 closed 1 year ago

csxie-666 commented 1 year ago

Hi, thanks for developing this tool. I predicted HGTs in about 40000 proteins of M.incognita using Avp (1.0.0) ,while i got about 4000 proteins(1/10) with a UNKNOWN tag. All of candidates are UNKNOWN. The .ai file are generated using diamond and uniref90 database. Is this correct or credible?
And how many high-confidence HGTs in Mi?

GDKO commented 1 year ago

Hi, There must be some error if all candidates are tagged as unknown. Double check if you have formatted correctly the database (see https://github.com/GDKO/AvP/wiki/Setting-up#uniref90). Also, can you share your config file and a few lines of the .ai file?

There are multiple publications describing HGTs in Mi (e.g. https://doi.org/10.1073/pnas.1008486107 and https://doi.org/10.1038/nbt.1482).

csxie-666 commented 1 year ago

Thanks for your response. Yes, the path to the database is incorrectly configured in my first running. Now i have modified the config file

like this: blast_db_path: /beegfs/home/xcs/HGT fasta_path: /beegfs/home/xcs/HGT/uniref90.fasta.fixed.gz mode: ur90 data_type: AA

So, if i set mode to ur90, the blast_db_path is not needed to be set?

and i noticed that the fasta header in uniref90.fasta.fixed.gz is like this “>Uniref90|UniRef90_UPI0003F0CD41 titin-like n=1 Tax=Elephantulus edwardii TaxID=28737 RepID=UPI0003F0CD41”

should i modify this to a more simple format? (ex. >Uniref90|Q6GZX3 TaxID=654924)

GDKO commented 1 year ago

So, if i set mode to ur90, the blast_db_path is not needed to be set?

Correct, but it wouldn't create a problem anyway

should i modify this to a more simple format?

No need to simplify

csxie-666 commented 1 year ago

Thanks!