dsehnal / LiteMol

A library/plugin for handling 3D structural molecular data (not only) in the browser.
Other
155 stars 36 forks source link

PatternQuery: ResidueIdRange() with residue ids (rather than index)? #8

Closed sillitoe closed 7 years ago

sillitoe commented 7 years ago

(this is an issue with PatternQuery rather than LiteMol, but I can't see a separate repo and I'm guessing they overlap...)

I would like to load a restricted set of residues based on structural domain boundaries (where a structural domain can consist of 1 or more regions of protein structure).

The PatternQuery documentation has a couple of functions that look useful:

Select individual residues by id:

ResidueIds("14 A", "15 A i:A", ... )

Select range of residues by index:

ResidueIdRange('A', 50, 100)

[NB: should this be 'ResidueIndexRange'??]

Either way - in the latter case - I presume the "index" coordinate system is based on the sequential ordering of ATOM records (or mmCIF equivalent), starting from 1? If so, this seems to be more useful as an internal coordinate system rather than for an external user (unless you can access the id to index lookup).

It would be really useful to be able to select a range based on a start and stop residue id as well as index.

e.g. selecting all residues between residues '15' and '200, insert code: A' in chain B...

something like:

ResidueIdRange('B', '15', '200A' )  

or

ResidueIdRange('15 B', '200 B i:A' )
dsehnal commented 7 years ago

PatternQuery isn't open source at the moment (wasn't really my decision at the time, but eventually I think I will put the entire WebChemistry thing on GitHub as well) and is actually written in C#. The query language implementation in LiteMol only uses the ideas from PQ.

The queries currently available in LiteMol are here.

What you are asking for can be accomplished using the query

sequence('1' /* entity id */, 'B' /* asymId */, { seqNumber: 15 }, { seqNumber: 200, insCode: 'A' })

Thinking about it, this might cause some trouble, because there is sometimes a conflict between _atom_site.label_asym_id and _atom_site.auth_asym_id in mmCIF, so the signature of sequence for the chain should also have the option to be an object instead of just a string. I will fix this.


I am also planning to include code completion for the queries to LiteMol (another thing on the TODO list :) )


ResidueIdRange(chain, a, b) from PatternQuery takes chain id and the numbers determine an interval given by the the sequence number of the residues (the _atom_site.auth_seq_id). So it would be equivalent to sequence('1', chain, { seqNumber: a }, { seqNumber: b }) with no ability to specify insertion code.


A limited set of queries is also available in the CoordinateServer (or the same thing running at PDBe, but not always the latest version) which is running LiteMol code using Node.js. The set of queries on CoordinateServer can be easily extended and you can somewhat easily run it on your own data (it will be open sourced soon as well).


+Edit: Improving the queries in the "original" PatternQuery is also an option and I might do the improved version of the sequence there as well.

dsehnal commented 7 years ago

And sorry, I got a little carried away in the response and forgot to answer what you were actually asking about.

It should be easy to add support for the "index" based selection (at least using 0 based indices) as this is something that is included in the internal LiteMol representation. I will think about it a bit more and include it.

sillitoe commented 7 years ago

Thanks.

The queries currently available in LiteMol are here.

That's a really useful link for documentation - are there plans to autogenerate API docs?

sequence('1' /* entity id /, 'B' / asymId */, { seqNumber: 15 }, { seqNumber: 200, insCode: 'A' })

Excellent, thanks

dsehnal commented 7 years ago

Yes, there are plans for autogenerated docs.

I am hoping that writing JSdoc comments and running JSdoc on the generated JavaScript will work. Haven't found anything useful to generate docs automatically from TypeScript yet.

If this does not work, there will definitely be at least a documentation in the style of PatternQuery for the query language.

sillitoe commented 7 years ago

I know very little about this, but...

?

dsehnal commented 7 years ago

I did have a quick look at these, but tsdoc had no activity for 3 years now and the language changed since.

I tried the demo (default theme page) for typedoc and it was not working so looked no further. But somehow I missed the typedoc.org page which looks maintained so I will give it a shot and see how it works.


On a "side note" related to the the sequence annotation, on mouse hover is already available in the Viewer app (accessed thru the elements in the red rectangles).

The source code for that is here, in case you would like to include it in CATH ;)

annotation

sillitoe commented 7 years ago

fwiw - looks like TypeDoc is maintained by the same people who wrote the TypeScript package for Atom (which seems to work well for me).

Thanks - I'm currently trying to see if I can get our superfamily superpositions to work (all non-redundant domains for a given superfamily). Figure if this plugin can display millions of atoms, it should be able to show ~100 structures on top of each other.

http://sillitoe.cathdb.info/superfamily?sfam_id=1.10.8.10

[NB: that's a test server - link probably won't work beyond today...]

Currently I'm loading previously superposed PDB files from my server - I figured it might be a better idea to select just the domain coordinates from PDBe server and apply the appropriate transformation...

dsehnal commented 7 years ago

Yes, 100s of structures should not be a problem in general.

It might be interesting to create a bundle from these structures and send them all in a single response, packed in BinaryCIF. Might be an interesting feature for the CoordinateServer actually. The query would look something like /1tqn,1cbs,1jj2/cartoons?encoding=bcif and would return a result with multiple data blocks, sending only the atoms required for cartoon representation.

Loading the originals and applying the transforms should not be very hard either, I would add a transform that would take the original model and copy just the transformed positions, reusing all the other data, making it very memory efficient as well (I am not too keen on mutating data in the LiteMol state).

Thoughts?

dsehnal commented 7 years ago

Also looking at your app, I think I will add the ID of the highlighted molecule to the highlight label.

And I've also just remembered that there is a limitation of 255 structures for which the highlight will work simultaneously. If this is an issue, I have ways of fixing this.

sillitoe commented 7 years ago

Sorry, meetings.

Yup, this was meant to be a simple proof of concept - using the local data we already have available. Figured it will be good to use this as an excuse to get stuck into LiteMol (I think we may already be semi-official collaborators on other projects :)

Moving over to CoordinateServer sounds like it would potentially make the tool much faster and more portable (which would be great).

Realistically, it will probably take me a while to get my head around customising the dashboard, let alone applying matrix transforms/rotations (I enjoy working on front-end stuff, but I don't get much time to do it).

Maybe I should start up a separate GitHub project - could act as a "how to implement your own application" tutorial. Might make it easier for you to point me in the right direction when I'm floundering...

dsehnal commented 7 years ago

What I propose would not require new UI elements. And applying a matrix transform to a molecule is something that should be a part of the core anyway.

Let me make an example app that downloads a bunch of structures from the coordinate server and applies a bunch of semi-random transforms to them (I actually have an implementation of the quaternion superposition algorithm in TypeScript, so I will use something like first 10 C-alphas for each structure to superpose them; it would be quite cool to do the domain superpositions directly on the client as well ... should not be that hard if its just 100s of structures).

From you API, you can then just serve some JSON that contains a list of PDB ids and the corresponding transformation.


As a next step, I could then add the ability to query multiple structures at the same time using the CoordServer. This would actually be a very nice use case for it and BinaryCIF and would look very good in a publication.

sillitoe commented 7 years ago

Sounds fantastic, many thanks.

We should probably move this discussion to a new GH issue :)

On Fri, 11 Nov 2016, 20:20 David Sehnal, notifications@github.com wrote:

What I propose would not require new UI elements. And applying a matrix transform to a molecule is something that should be a part of the core anyway.

Let me make an example app that downloads a bunch of structures from the coordinate server and applies a bunch of semi-random transforms to them (I actually have an implementation of the quaternion superposition algorithm in TypeScript, so I will use something like first 10 C-alphas for each structure to superpose them; it would be quite cool to do the domain superpositions directly on the client as well ... should not be that hard if its just 100s of structures).

From you API, you can then just serve some JSON that contains a list of

PDB ids and the corresponding transformation.

As a next step, I could then add the ability to query multiple structures at the same time using the CoordServer. This would actually be a very nice use case for it and BinaryCIF and would look very good in a publication.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/dsehnal/LiteMol/issues/8#issuecomment-260046827, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJVepe4k5ZFWF9pLOwyXUM325-JMbnlks5q9M4UgaJpZM4Kv4aj .

dsehnal commented 7 years ago

I've added the Transforms example. And yes, we should probably move this conversation elsewhere :)

sillitoe commented 7 years ago

Nice! Great work.

Useful to see the ligands in there too.

Will need to do some work to get our backend providing these transforms.

I'll create a some GH issues for documentation...