Pymol-Scripts / Pymol-script-repo

Collected scripts for Pymol
http://www.pymolwiki.org/index.php/Git_intro
435 stars 257 forks source link

findseq has wrong selection behavior for multiple chain objects #148

Closed jrom99 closed 1 month ago

jrom99 commented 1 month ago

I have a file that contains a protein /obj/A/A and an ssRNA /obj//A, when I use findseq to search for a peptide without specifying only the protein, it will find the sequence, but then selects something like /obj/*/A/x-y (for example, it finds residues 10-20 on the protein, but also selects nucleotides 10-20 in the RNA).

I'm using pymol version 3.1.0a0, and the plugin was installed from the repo. As far as I know, I've seem this strange behavior for objects with multiple chains, but I was not sure what is causing it until now.

pslacerda commented 1 month ago

Is this a bug or an observation? It shouldn't select also /obj//A/10-20 because isn't a matching sequence, right?

pslacerda commented 1 month ago

Seems that the segi identifier is empty on findseq selections. Can you share your file?

jrom99 commented 1 month ago

It seems to be a bug, since the ssRNA is not a matching sequence. I have observed it affecting other PDB files with multiple peptide chains in the same object, so it doesn't seem to be specific to ssRNA objects.

I can't share the file since it's work related (but it was generated using ZDOCK), but I'll try to find a similar one on PDB and test on it. Here is a screenshot of the output:

image

For comparison, when I explicitly select objects it works as expected:

image

Or when there is one object:

image

jrom99 commented 1 month ago

For protein-only files, I had two PDB files where their sequences were in the form:

>file1 chain A
xxxxxxGHIxx
>file2 chain A
xGHIxxxxxxx

And running findseq GHI, * would select /file1//A/2-4, /file1//A/7-9, /file2//A/2-4 and /file2//A/7-9

pslacerda commented 1 month ago

I guess this findseq command was made for only a single protein as haystack argument. You have a reason to want to support multiple proteins in a single call or calling multiple times is enough?

jrom99 commented 1 month ago

Yes, I have about multiple proteins per pymol session with similar-ish sequences, and I use findseq to find and color a specific peptide in a different color. Currently, I have to do something like:

color tv_red, *
python

for obj in cmd.get_object_list("(*)"):
    cmd.findseq("myregex", obj, "seq")
    cmd.color("seq", "tv_green")

python end

And then manually delete the seq selection.

As one can see, it's not that hard, but it would be a lot more convenient if findseq could work on multiple objects.

pslacerda commented 1 month ago

In my tests findseq is working correctly and selecting only the haystack object:

fetch 7C2Q 8UH8
findseq FRK, 7C2Q, seq
jrom99 commented 1 month ago

Assuming your test is findseq FRK, *, seq, it shows that /7c2q/A/A/3-6, /8uh8/A/A/3-6 and /7c2q/B/B/3-6 were selected, which is the expected selection with and without this bug.

The extra selection that is expected in this bug would be either /8uh8/B/B/3-6 or /8uh8/B/A/3-6, but the object /8uh8/B/B doesn't exist, and I'm not sure if the object /8uh8/B/A would be affected or not, since it starts numbering at 401 and has a different /obj/this part/chain-id name (I'm not sure what is the name of this region).

Do you know other files that might be able to better replicate this bug?

pslacerda commented 1 month ago

I was reticent to support findseq FRK, *, seq if it would breeak some previous behavior instead of calling multiple times like your Python example.

I will try to add support for multiple objects now. The /object-name/segi-identifier/chain-identifier part refers to the segment identifier and seems also buggy.

pslacerda commented 1 month ago

Just finished this feature in master, can you check it? https://github.com/Pymol-Scripts/Pymol-script-repo/commit/7b56fe28662269470daabf39d678c56dbfbccd53

You can now omit the haystack argument like findseq FRK.

jrom99 commented 1 month ago

It didn't work, but I tried hacking together something that may work.