Pymol-Scripts / Pymol-script-repo

Collected scripts for Pymol
http://www.pymolwiki.org/index.php/Git_intro
435 stars 257 forks source link

Fix support for multiple objects selection #150

Closed jrom99 closed 1 month ago

jrom99 commented 1 month ago

In the proposed implementation, running needle.finditer(AAs) will just redo the (constant) search for each object, without actually checking if the single-letter sequence refers to that object.

I changed the iteration method to use cmd.get_model, but I don't know how to get the returned atoms to expose their object names, so I used cmd.get_object_list and run cmd.get_model for each object.

I find cmd.get_model easier to use than cmd.iterate, but if the latter exposes the atom model then it may be more useful.

May fix bug https://github.com/Pymol-Scripts/Pymol-script-repo/issues/148

pslacerda commented 1 month ago

Hi! Very nice code.

In my limited tests, it works like a charm.

In fact, cmd.iterate supports retrieve the object name via the model variable. Because I didn't knew it either and also prefer the cmd.get_model, I submitted https://github.com/schrodinger/pymol-open-source/pull/380.

pslacerda commented 1 month ago

We could also think about to support 3-letter hyphen separated residue codes in the regex to better support unusual amino acids.

And the checkParams for firstOnly is wrong since before.

pslacerda commented 1 month ago

@jrom99, if you still want to use cmd.iterate's model variable instead of pm.get_model, please commit on this branch.

jrom99 commented 3 weeks ago

We could also think about to support 3-letter hyphen separated residue codes in the regex to better support unusual amino acids.

And the checkParams for firstOnly is wrong since before.

Hello, I accidentally messed firstOnly, now it returns the first match for every chain in each object instead of the global first match.

We can now have both behaviors (global first versus chain first), but I'm not sure if the later would be useful.

For now I tried to restore the expected behavior in my fork branch.

pslacerda commented 3 weeks ago

Hi @jrom99, I didn't understood what you said...

jrom99 commented 3 weeks ago

There are three possible behaviors when findseq is used on a multi-chain protein with multiple matches:

>chain_A
xxxMxxxxMx
>chain_B
xxxxMxxM
  1. selection obj/A/4+9 or obj/B/5+8 (all matches)
  2. selection obj/A/4 (global first match)
  3. selection obj/A/4 or obj/B/5 (per chain first match)

firstOnly=0 should use the first behavior, while firstOnly=1 should use the second. Due to a bug, firstOnly=1 results in the third behavior.

pslacerda commented 3 weeks ago

I agree with you. Multiple objects wasn't though initially so the firstOnly=1 behavior should be to global match (per object).