HWaymentSteele / AF_Cluster

Predict multiple protein conformations using sequence clustering and AlphaFold2.
MIT License
132 stars 21 forks source link

Issues regarding the DBSCAN hyperparameters finding. #1

Open miss77jun opened 1 year ago

miss77jun commented 1 year ago

Hi, I am using AF_Cluster for predicting alternative conformations for some proteins. However, the scan for epsilon fails for cases where the epsilon scan is stopped by the condition of esp>10 and n_clust==1.

May I ask why this condition is mandatory during the DBSCANs? What is the best way to work around the situation where for all eps <= 10 and n_clust==1 there only exists one cluster?

Referring to file AF_Cluster/scripts/ClusterMSA.py

Line 115-116

            if eps>10 and n_clust==1:
                break

thank you very much! Tengyu

HWaymentSteele commented 1 year ago

Hi Tengyu, thanks for your interest in the code! I coded it as such because I was finding in my screening that eps was typically smaller than 10 and if by 10 the number of clusters detected was only 1, then it was a waste of time to continue scanning larger values.

Can I ask roughly your MSA sizes and protein lengths are to be encountering this? I am happy to move it to be an optional flag, or if you’d like to submit a PR, happy to go that route too.

Best,

Hannah

On Nov 4, 2022, at 6:26 AM, Tengyu @.***> wrote:

Hi, I am using AF_Cluster for predicting alternative conformations for some proteins. However, the scan for epsilon fails for cases where the epsilon scan is stopped by the condition of esp>10 and n_clust==1.

May I ask why this condition is mandatory during the DBSCANs? What is the best way to work around the situation where for all eps <= 10 and n_clust==1 there only exists one cluster?

Referring to file AF_Cluster https://github.com/HWaymentSteele/AF_Cluster/scripts https://github.com/HWaymentSteele/AF_Cluster/tree/main/scripts/ClusterMSA.py

Line 115-116

        if eps>10 and n_clust==1:
            break

thank you very much! Tengyu

— Reply to this email directly, view it on GitHub https://github.com/HWaymentSteele/AF_Cluster/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFMNCUZ6BNYGDNTDWNX4LA3WGTQFDANCNFSM6AAAAAARXBAI3Q. You are receiving this because you are subscribed to this thread.

miss77jun commented 1 year ago

Thanks for your reply.

I got your meaning. But the condition is a little tough for some MSAs. Note I generated MSAs following the protocol of AF2. Moving the condition to an optional flag is a good way to make the program more flexible.

One case is 3QF4_A, whose MSA size is 8359 and sequence length is 572.

Regards Tengyu

chenyuwai commented 8 months ago

Hello, I support to add codes to make the system more generally applicable. In my case, epsilon ~ 22. I got this after I set "--max_eps=30" to relax the maximum epsilon and commented out the two break lines mentioned by the original poster.