BigDataBiology / SantosJunior_Torres_2024_AMPSphere_v1

Figures and files used in the AMPSphere manuscript
MIT License
4 stars 1 forks source link

About Cdhit Not Found #5

Closed hanran11 closed 3 months ago

hanran11 commented 3 months ago

Hello! When i run the scripts /general_scripts/01 resource_generation/main.py, it was said that

From prediction to families
Set up environment
Eliminating sequences containing non-standard residues
It was found a total of 9090033 AMPs
-- Filtering singletons:
Singletons: 3573672
Non-Singletons: 861565
Recovering AMPs matching to DRAMP but still singletons
There was a change in the versions of MMSeqs2.
                 This impacted the results in this step. Do you want to 
                 proceed the alignment? Alternatively, there is a pre-computed
                 set of results which can be used.
                 : 
Assuming the precomputed results
We could save 1933
Clustering AMPs
Hierarchical clustering started
        1st stage - 100% of identity
[ERROR] -- Cdhit Not Found --

I want to know how to slove this problem, if there have some files i have not download? And which the mirror source was you used, i can not use the file enviroment.yml in my current mirror source

I am not a native English speaker and my word may be a little rude, Pardon me!(>﹏<)

l am looking foward to your reply!

celiosantosjr commented 3 months ago

Hi,

It means you are missing CDHIT from your main path. In case you cannot use yml files, you can just add cdhit installation to the main path and run the pipeline as it is. To install cdhit, follow this link

It is very recommendable to use the conda structure and install all the required software in the fashion presented in the pipeline. If you want to obtain the final clustered genes you can always download them from the AMPSphere resource that already contain the clusters and the individual sequences with more relevant metadata associated.

HTH, Celio