Closed jrom99 closed 1 month ago
Is this a bug or an observation? It shouldn't select also /obj//A/10-20
because isn't a matching sequence, right?
Seems that the segi
identifier is empty on findseq
selections. Can you share your file?
It seems to be a bug, since the ssRNA is not a matching sequence. I have observed it affecting other PDB files with multiple peptide chains in the same object, so it doesn't seem to be specific to ssRNA objects.
I can't share the file since it's work related (but it was generated using ZDOCK), but I'll try to find a similar one on PDB and test on it. Here is a screenshot of the output:
For comparison, when I explicitly select objects it works as expected:
Or when there is one object:
For protein-only files, I had two PDB files where their sequences were in the form:
>file1 chain A
xxxxxxGHIxx
>file2 chain A
xGHIxxxxxxx
And running findseq GHI, *
would select /file1//A/2-4
, /file1//A/7-9
, /file2//A/2-4
and /file2//A/7-9
I guess this findseq
command was made for only a single protein as haystack
argument. You have a reason to want to support multiple proteins in a single call or calling multiple times is enough?
Yes, I have about multiple proteins per pymol session with similar-ish sequences, and I use findseq
to find and color a specific peptide in a different color. Currently, I have to do something like:
color tv_red, *
python
for obj in cmd.get_object_list("(*)"):
cmd.findseq("myregex", obj, "seq")
cmd.color("seq", "tv_green")
python end
And then manually delete the seq
selection.
As one can see, it's not that hard, but it would be a lot more convenient if findseq
could work on multiple objects.
In my tests findseq
is working correctly and selecting only the haystack object:
fetch 7C2Q 8UH8
findseq FRK, 7C2Q, seq
Assuming your test is findseq FRK, *, seq
, it shows that /7c2q/A/A/3-6
, /8uh8/A/A/3-6
and /7c2q/B/B/3-6
were selected, which is the expected selection with and without this bug.
The extra selection that is expected in this bug would be either /8uh8/B/B/3-6
or /8uh8/B/A/3-6
, but the object /8uh8/B/B
doesn't exist, and I'm not sure if the object /8uh8/B/A
would be affected or not, since it starts numbering at 401 and has a different /obj/this part/chain-id
name (I'm not sure what is the name of this region).
Do you know other files that might be able to better replicate this bug?
I was reticent to support findseq FRK, *, seq
if it would breeak some previous behavior instead of calling multiple times like your Python example.
I will try to add support for multiple objects now. The /object-name/segi-identifier/chain-identifier
part refers to the segment identifier and seems also buggy.
Just finished this feature in master, can you check it? https://github.com/Pymol-Scripts/Pymol-script-repo/commit/7b56fe28662269470daabf39d678c56dbfbccd53
You can now omit the haystack argument like findseq FRK
.
It didn't work, but I tried hacking together something that may work.
I have a file that contains a protein
/obj/A/A
and an ssRNA/obj//A
, when I usefindseq
to search for a peptide without specifying only the protein, it will find the sequence, but then selects something like/obj/*/A/x-y
(for example, it finds residues 10-20 on the protein, but also selects nucleotides 10-20 in the RNA).I'm using pymol version 3.1.0a0, and the plugin was installed from the repo. As far as I know, I've seem this strange behavior for objects with multiple chains, but I was not sure what is causing it until now.