chrisquince / STRONG

Strain Resolution ON Graphs
MIT License
44 stars 9 forks source link

Incompatibility in COG databases #46

Closed chrisquince closed 5 years ago

chrisquince commented 5 years ago

So the current version seems to be using incompatible cog databases. For example the entry 'gnl|CDD|319244' can be returned as a rpsblast hit but this is not present in cdd_to_cog.tsv:

grep "319244" /mnt/gpfs/Hackathon/FMTMeren/STRONG/COG_pipe/scg_data/cdd_to_cog.tsv

This causes pipeline to crash. I will do a temporary fix by actually adding some error checking to ./Filter_Cogs.py but we need to work out why we have incompatible results.

Sebastien-Raguideau commented 5 years ago

Looking at dependencies, I realized a new cogdb was available and updated it but forgot to update the cdd_to_cog file. I will regenerate that file at some point this week. It's easy, the info is at : ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/cddid.tbl.gz But in the mean time it is possible to run with previous cog database, as I did not delete it but moved it, you can just change the config file and use : /mnt/gpfs/seb/Database/rpsblast_cog_db/old_cogs

chrisquince commented 5 years ago

OK Seb we can close this when the new cdd file is uploaded. More care needs to be taken though to avoid these issues as this has put the analysis back by a week and wasted yesterday afternoon for me. The test example probably would not help here but error checking in the scripts would have (I have added a small bit).

Sebastien-Raguideau commented 5 years ago

I'm sorry I made you lost a lot of time. I tend to do mutliple things at the same time which make me error prone. This is a typical example where I change something without going through testing/or looking for what are other linked things which should also be changed. I will improve that. I just updated the cdd_to_cog file, so it should work now.