3dmol / 3Dmol.js

WebGL accelerated JavaScript molecular graphics library
https://3dmol.org/
Other
794 stars 194 forks source link

Can't select properly, nor generate cartoon & multiple structures #772

Closed JavierSanchez-Utges closed 6 months ago

JavierSanchez-Utges commented 7 months ago

Hi, I am having issues showing the cartoon for the protein atoms in one of my files. Here I attach it (had to change extension to .txt to attach it here, but it is .cif. What I want to do is format ligands in sticks and protein in cartoon, something I have used on other occasions. This is the code I was using:

viewer.addStyle({hetflag: true},{stick:{colorscheme:"greenCarbon"}});
viewer.addStyle({hetflag: true, invert: true}, {cartoon: {style: 'oval', color: 'white', arrows: true,}});

First line works and displays all HETATMS as sticks and greenCarbon colour scheme, but second does not work. The selection does work and selects all PROTEIN atoms, as sticks works, but nor cartoon.

  1. Any idea why this might be?

  2. Is this the best way to select HETATM and ATOM, using the hetflag?

  3. I am working with multiple structures, or models, opened at the same time. How can I select by model? I saw there is the model argument. Is this it? Are models numbered in the order they are loaded,

    So if I loaded these 3 files, model 2 is the second one I loaded?

For example, I want to select residue number 807, of chain A of model 3 (6ej1_A_trans.cif). Would it be something like: {resi: 807, chain: 'A', model: 3}?

I actually can't make this command work here: viewer.addStyle({resn: 'EDO'},{stick:{color: "blue"}});.

Thanks so much!

5fpu_A_trans.txt 5fz1_A_trans.txt 6ej1_A_trans.txt

JavierSanchez-Utges commented 7 months ago

I have now realised that there must be some parsing issue. Whilst I have processed these files, and they are not the same as downloaded from PDB, they work fine with Chimera, ChimeraX and PyMol. I am assuming there must be some field missing from them that 3DMol.js uses. I replaced the files by raw ones and the commands work fine.

Complex selections also work fine, like so: {hetflag: true, model: 10, resn: 'EDO', chain: 'A', resi:1771,}, selecting the ResNum 1771 from Chain A of model 10, which is an HETATM with resName EDO.

I have also noticed that the models are identified with the method $model.getID(), and that the order is determined by the order they are loaded into the viewer, which might be different to the order they are passed to it.

It would be great to find out what is the parsing issue and if it can be fixed, as I have thousands of these processed files that I want to visualise using 3DMol.js !

Thanks!

JavierSanchez-Utges commented 6 months ago

I have realised my files are missing two fields from the _atom_site category: these are auth_comp_id and auth_atom_id, which seem to be used in the parsing of the CIF file: https://3dmol.csb.pitt.edu/doc/parsers_CIF.ts.html.

Perhaps what happens is these are left undefined and therefore selections don't work. However, other visualisation software work well with these files. Maybe when they realise auth_comp_id and auth_atom_id, they use label_comp_id and label_atom_id.

I could try to edit the parser in .ts to accommodate for this and see what happens...

JavierSanchez-Utges commented 6 months ago

I tracked this missing columns issue to my processing function with Bio.PDB.MMCIFIO().save() method, which seems to be dropping precisely the two columns mentioned above: auth_comp_id and auth_atom_id: https://github.com/biopython/biopython/issues/3439 .

Perhaps editing the code here will work: https://github.com/biopython/biopython/blob/f626707607da88157c8c536a7386e5816804508f/Bio/PDB/mmcifio.py#L365 .

JavierSanchez-Utges commented 6 months ago

Do you think editing lines 245-250 of the CIF parser (https://3dmol.csb.pitt.edu/doc/parsers_CIF.ts.html) like so would work?

atom.resn = mmCIF._atom_site_auth_comp_id
    ? mmCIF._atom_site_auth_comp_id[i].trim()
    : (mmCIF._atom_site_label_comp_id
        ? mmCIF._atom_site_label_comp_id[i].trim()
        : undefined);
atom.atom = mmCIF._atom_site_auth_atom_id
    ? mmCIF._atom_site_auth_atom_id[i].replace(/"/gm, "")
    : (mmCIF._atom_site_label_atom_id
        ? mmCIF._atom_site_label_atom_id[i].replace(/"/gm, "")
        : undefined); // "primed" names are in quotes

With this code, I intent that if _atom_site.auth_atom_id is missing, the label_atom_id is used, and the same for _atom_id.

I am currently using the latest version minified files, I guess if I want to try to edit the source code, I have to git clone the repository and install the library with npm?

Thanks!

dkoes commented 6 months ago

I've pushed this fix (although the first two files you provided didn't have any non-HETATMs in them so I wasn't sure what was the problem with them). https://3dmol.org/tests/auto/generate_test.cgi?test=ciffields

JavierSanchez-Utges commented 6 months ago

OK. I can see that works, and it is already available on the .min.js. The problem with the three files is that they were not getting parsed correctly because of the missing fields, so selecting by resn or resi was not working well. Colouring by hetflag was working well though, but I could not apply cartoon style to the {hetflag: true}, for example. That is fixed now!

However, I have noticed that some of my files now raise a new error. I have narrowed it down to these two files, and identified the particular rows that are making this happen:

For 5A1F: HETATM 3612 O 'O2'' . OGA C ? . ? 83.897 66.219 11.957 1.0 35.73 1756 A 1 . 1756

For 5FV3: HETATM 3717 O 'O2'' . OGA H ? . ? 83.951 66.089 11.876 1.0 39.51 1770 A 1 . 1770

In the original files, as downloaded from PDBe, these look like "O2'", but it seems Bio.PDB.MMCIFIO() is transforming the double quotes to single quotes, which might clash with your string replacement in the parser? What I know is that if the double quotes are restored, the structure is loaded fine. Losing the quotes completely, and having atom ID as O2' also breaks the parsing.

This is the traceback of the error:

3Dmol-min.js:2 Uncaught TypeError: Cannot read properties of undefined (reading 'trim')
    at p (3Dmol-min.js:2:332397)
    at GLModel.parseMolData (3Dmol-min.js:2:52291)
    at GLModel.addMolData (3Dmol-min.js:2:39872)
    at GLViewer.addModel (3Dmol-min.js:2:109784)
    at Object.success (experiment_V2.html:56:28)
    at c (jquery-3.6.1.min.js:2:28327)
    at Object.fireWith [as resolveWith] (jquery-3.6.1.min.js:2:29072)
    at l (jquery-3.6.1.min.js:2:80045)
    at XMLHttpRequest.<anonymous> (jquery-3.6.1.min.js:2:82499)

These are the files:

5fv3_A_trans.txt 5a1f_A_trans.txt

Many thanks!

dkoes commented 6 months ago

I really think replacing the double quotes with single quotes when the string contains a single quote is simply wrong of biopython as the string is no longer well formed.

dkoes commented 6 months ago

I've committed a workaround for biopython's broken cif files: https://3dmol.org/tests/auto/generate_test.cgi?test=cifprime

JavierSanchez-Utges commented 6 months ago

I do agree BioPython is making a mistake with the double quotes. Thanks so much for the workaround. I really appreciate it. All my example files are working fine now!

Screenshot 2024-03-07 at 09 57 52