Custom features! - Githubissues

brookehus commented 5 years ago

[WIP] - Watch this space!

To do list (just for myself)

[x] Figure out something to do with distances. Move to geometry; possibly do them all the same way, or keep the faster scipy option for numpy only? Check whether dmat already does the work of redundant distances? Possibly move this entire thing to geometry?
[x] Add a way to specify bonds and deal with them specifically
[x] Change names to Geometry*
[x] Add docs and more tests, duh

Part II (#61):

[x] Write tests for bond functionality
[x] Fix prior stuff (@nec4) and tests

Part III:

[x] Fix example notebook

brookehus commented 5 years ago

@coarse-graining/developers FYI this is going to be hairy - if you plan on making changes to ProteinBackboneStatistics or ProteinBackboneFeature (both of which may even be renamed) merge to this PR or just wait.

nec4 commented 5 years ago

Sounds good. I'll take a look!

brookehus commented 5 years ago

@coarse-graining/developers : ProteinBackboneStatistics and ProteinBackboneFeature are no longer going to be backbone only. What do you want to rename them? ProteinStatistics and ProteinFeature? Or something more general like GeometryStatistics and GeometryFeature?

nec4 commented 5 years ago

GeometryFeature and GeometryStats sound great to me!

brookehus commented 5 years ago

Ok. I've made a buuunch of changes. A key one is that I got rid of the bond constant dictionary. Now you just get a prior_statistics_dictionary where the constant is automatically calculated for everything you specify. If you want it for only certain things, then you could make a new GeometryStatistics object for only those things? Or you could post-process by only grabbing certain keys using return_indices? So I'm imagining the pipeline is like,

Get stats for everything using GeometryStatistics
Use return_indices to get indices for the priors you want
Slice the prior_statistics_array according to those indices to get only e.g. bond and angle constants
Feed into priors?

So this obviously breaks all the priors stuff. @nec4 can you give this a hard look and see if you think the new structure works with the priors?

brookehus commented 5 years ago

A key thing that I haven't done yet is worked much with or tested bonds. What I'm doing now is there are two new keyword arguments to the GeometryStatistics class: bond_inds (list of 2-element tuples) and adjacent_backbone_bonds (bool). You feed in stuff that is bonded which has to be contained in the features that are being calculated (trivial if you have get_all_distances=True, otherwise must be contained in custom_features). Then bonds are populated according to bond_inds as well as pairs that are adjacent along the backbone if adjacent_backbone_bonds=True (redundancies are removed). return_indices draws from bond_inds. This is untested right now.

nec4 commented 5 years ago

Everything here looks really good. However, it would be nice to be able to quickly instantiate a GeometryFeature layer off of a GeometryStatistics layer with possibly custom features in the part of the pipeline that precedes model training. Since GeometryFeature.__init__() takes an optional list of tuples argument feature_inds, would it be useful for GeometryStatistics to have some method that could return a simple list of all feature tuples (Currently there are just attributes for bonds/adjacent neighbors, etc)? The motivation for this comes from the idea that GeometryStatistics is used for preprocessing and aids in construction of _PriorLayer objects, and once that is done, the user can simply just grab all feature tuples and feed them directly to a GeometryFeature layer in a single line.

Of course, one could also just do something like:

feature_tuples = []
for desc in GeometryStatistics.order:
    feature_tuples += GeometryStatistics.descriptions[desc]
FeatureLayer = GeometryFeature(feature_inds=feature_tuples)

Bearing in mind that this approach also produces duplicate dihedral tuples (see the comment below), what do you think?

brookehus commented 5 years ago

I think I addressed or followed up to all of your comments!

brookehus commented 5 years ago

Can you update your review? @nec4 what do you think we have to do here besides "functional testing" via fixing the example nb?

nec4 commented 5 years ago

I can't think of anything off the top of my head - perhaps something will come up as we go through the example notebook. I have been playing with the tools using random data in a little notebook for a while now and I am happy with how they work. Will update the review.

brookehus commented 5 years ago

Ok, added a get_zscore_array method (this commit) and test (this commit).

The test only addresses the correspondence of get_zscore_array to get_prior_statistics, not the independent functioning of get_zscore_array.

coarse-graining / cgnet

Custom features! #58