[WIP] Creation of an abstract base class for test systems

jchodera commented 10 years ago

I've started working on the creation of an abstract base class for test systems. This will involve the addition of methods for retrieving analytically or numerically computed properties such as

potential energy mean and standard deviation
volume mean and standard deviation (for constant-pressure systems)

I have also renamed coordinates to positions to match OpenMM convention.

This is a work-in-progress and not yet ready to merge in.

jchodera commented 10 years ago

Currently, a System object and positions are created at test system object construction, and retrieved via the idiom

testsystem = TestSystem(...)
[system, positions] = [testsystem.system, testsystem.positions]

This feels awkward to me.

An alternative would be to add a create() method:

testsystem = TestSystem(...)
[system, positions] = testsystem.create()

This would also allow us to create multiple copies of the system by calling create() additional times, though I'm not sure if this would actually be useful.

Finally, we could simply make create() a static method so we can do this in one line, but then the parameters would need to be passed to create (and separately to any method where we comptue analytical properties):

[system, positions] = TestSystem.create(...)

jchodera commented 10 years ago

Also, I'm not sure if the unimplemented analytical property methods should return NotImplementedException or simply None. It might be easier just to return None.

kyleabeauchamp commented 10 years ago

So I personally don't have a big problem with directly accessing test_system.system and test_system.positions. If we want to have a docstring for those attributes, we can use an accessor decorator and include it in the base class--that allows to formalize test_system.system and test_system.positions as an interface.

I'm fine with just using None. It's a pretty natural fit. If we need an exception, we can swap it out later.

jchodera commented 10 years ago

@kyleabeauchamp Can you take a look at the base class and HarmonicOscillator class/example? Does this look like how we want to structure this, or should we roll temperature and pressure into the class initializer?

kyleabeauchamp commented 10 years ago

IMHO getSystem() should be get_system by pep8.

Also, I think this should probably be accessed as a @property, rather than a "get_xyz()" function. For example, see https://github.com/rmcgibbo/mdtraj/blob/master/MDTraj/trajectory.py

The point is by using @property, we can control access to properties, but users just see the variables--no function call required. For example:

In [23]: traj.n_atoms
Out[23]: 12

jchodera commented 10 years ago

Should the analytical properties also be @property attributes? If so, we will just have temperature and pressure passed into the constructor. This is probably useful anyway since we might want to construct system objects containing barostats, which require the pressure.

We can then get rid of the get_system and other accessors.

kyleabeauchamp commented 10 years ago

Yeah, that sounds good. The constructor will take and store all necessary inputs to describe the system--one object per thermodynamic state.

I finished looking at the code and everything looks good.

jchodera commented 10 years ago

Could we have the constructor take a ThermodynamicState instead, or does that just introduce too many cross-dependencies?

kyleabeauchamp commented 10 years ago

IMHO that's to be avoided--this is supposed to be a standalone project, so users might not have Yank or repex installed.

jchodera commented 10 years ago

I've reworked the abstract base class to use @property variables, but I'm not sure this is the way to go.

I'm also unsure of whether we should definitely be passing in temperature and pressure to the constructor of each of these objects for two reasons:

Testing ParallelTempering will be difficult because a single system is passed in and the thermodynamic states for various different temperatures are automatically generated
There will be a variable number of (optional) arguments for each test system, negating some of the value of having a base class.

I'll think about this more in the morning.

kyleabeauchamp commented 10 years ago

Optional kwargs in the superclass shouldnt be a major issue. On Nov 9, 2013 12:05 AM, "John Chodera" notifications@github.com wrote:

I've reworked the abstract base class to use @propertyhttps://github.com/propertyvariables, but I'm not sure this is the way to go.

I'm also unsure of whether we should definitely be passing in temperature and pressure to the constructor of each of these objects for two reasons:

Testing ParallelTempering will be difficult because a single system is passed in and the thermodynamic states for various different temperatures are automatically generated

There will be a variable number of (optional) arguments for each test system, negating some of the value of having a base class.

I'll think about this more in the morning.

— Reply to this email directly or view it on GitHubhttps://github.com/choderalab/openmm-testsystems/pull/5#issuecomment-28118983 .

jchodera commented 10 years ago

After some thought, this testsystems module will only really be useful for testing things like

Does my function actually run on OpenMM System objects of various types?
Do the OpenMM Integrator types produce the correct expectations?
Does the repex module produce the correct expectations? We'll probably need a separate test system module for testing alchemical free energy calculations in YANK specifically. That can go straight into the YANK repo.

Will rework this a bit to make sure this can be easily used for the three purposes above.

kyleabeauchamp commented 10 years ago

Right, but I still think it's useful to have a module of testsystems that can be shared--otherwise we would have really extreme duplication.

Regarding @property, I'm happy with either way. I guess I see what you're saying---it doesn't really make sense to use @property on the get_analytical_something() functions, as they really are functions and not properties. I guess @property made more sense on system and positions, but we might as well just use functions there to have a consistent api (get_positions()).

kyleabeauchamp commented 10 years ago

Rather get_positions()--the point would be to match get_analytical_something()...

kyleabeauchamp commented 10 years ago

PS: I'm happy with whatever you decide here. We can always go back and make adjustments later--I'm almost positive we will have to, given the scope of all the refactoring that's going on right.

jchodera commented 10 years ago

Agreed. It's best not to overthink this up front since things will change anyway. Let me check in a workable update in the next couple of hours and we can just move on.

jchodera commented 10 years ago

OK, I've majorly reworked how analytical properties are computed and have added several simple analytical properties to a few systems. I also added a little example in the __main__ section showing how test classes are retrieved and illustrating how analytical properties are computed.

This is suboptimal at the moment because we often want to only work with classes that make analytical properties available, but my analytical_properties @property only works when you have instantiated an object. Instead, we could change this to a @classmethod get_available_analytical_properties or something so that we can examine a class for properties before incurring the cost of instantiating it. That might make testing faster.

We will want to add an __init__.py to make sure we can do from testsystems import HarmonicOscillator and the like. Working with testsystems/test_systems.py is otherwise suboptimal, because we cannot do import testsystems and work with that module inside of testsystems.py for testing.

@kyleabeauchamp can you look this over?

jchodera commented 10 years ago

By the way, the comparison test of system energies across platforms shows the CPU platform has significant discrepancies for explicitly solvated systems:

AlanineDipeptideExplicit
                       Reference     -5892.685928 kcal/mol         0.000000 kcal/mol       202.443888 kcal/mol         0.000000 kcal/mol
                             CPU     -6065.950652 kcal/mol      -173.264724 kcal/mol       202.443530 kcal/mol         0.007724 kcal/mol
                                 WARNING: Potential energy error (-173.264724 kcal/mol) exceeds tolerance (0.060000 kcal/mol).  Test failed.
                            CUDA     -5892.682110 kcal/mol         0.003818 kcal/mol       202.443798 kcal/mol         0.007719 kcal/mol
SrcExplicit
                       Reference   -171304.840124 kcal/mol         0.000000 kcal/mol       247.981475 kcal/mol         0.000000 kcal/mol
                             CPU   -171993.268385 kcal/mol      -688.428261 kcal/mol       247.981358 kcal/mol         0.011733 kcal/mol
                                 WARNING: Potential energy error (-688.428261 kcal/mol) exceeds tolerance (0.060000 kcal/mol).  Test failed.
                            CUDA   -171304.578474 kcal/mol         0.261650 kcal/mol       247.981480 kcal/mol         0.011730 kcal/mol
                                 WARNING: Potential energy error (0.261650 kcal/mol) exceeds tolerance (0.060000 kcal/mol).  Test failed.
MethanolBox
                       Reference       297.113123 kcal/mol         0.000000 kcal/mol       180.720998 kcal/mol         0.000000 kcal/mol
                             CPU       264.206457 kcal/mol       -32.906667 kcal/mol       180.721086 kcal/mol         0.000370 kcal/mol
                                 WARNING: Potential energy error (-32.906667 kcal/mol) exceeds tolerance (0.060000 kcal/mol).  Test failed.
                            CUDA       297.110284 kcal/mol        -0.002840 kcal/mol       180.721246 kcal/mol         0.004272 kcal/mol

jchodera commented 10 years ago

I've also added a base class function to serialize test systems to XML to aid in debugging.

I think we're good to merge as soon as someone reviews this.

kyleabeauchamp commented 10 years ago

Looks good to me. Is this the permanent home of the CustomIntegrator stuff? And the generateMaxwellBoltzmann velocities?

If so, we could possibly change name of this repository to OpenMM_Utilities and just claim that this repo contains all the odds and ends that are required for Yank, Repex, and other ChoderaLab projects.

kyleabeauchamp commented 10 years ago

Regardless, I'm +1 to merge.

jchodera commented 10 years ago

I'm thinking this can be the (at least temporary) home of Python modules and tests that test aid in testing OpenMM or OpenMM-derived applications in various ways.

This should not necessarily be the home of CustomIntegrator classes (which should probably be moved to a separate openmm-integrators distribution) or generateMaxwellBoltzmann() (which should be deprecated in favor of Context.setVelocitiesToTemperature()).

jchodera commented 10 years ago

I think it's useful enough for now to merge, so I'm merging now.

jchodera commented 10 years ago

Is standard practice to delete the branch after a successful merge?

Any idea how to resync my local repository clone to the master branch after that?

kyleabeauchamp commented 10 years ago

So deleting is optional, but does help reduce the clutter.

To resync your local, you just need to

git fetch upstream
git merge upstream/master

kyleabeauchamp commented 10 years ago

Actually, you probably first want to git checkout master

jchodera commented 10 years ago

Thanks! For reference, I did:

git checkout master
git fetch upstream
git merge upstream/master

jchodera commented 10 years ago

Whoops! Apparently, that doesn't automatically merge the changes into my own master branch. Figuring out how to do that now.

kyleabeauchamp commented 10 years ago

Did you do that in the directory of your own repo? That should work AFAIK--I've done it > 10 times.

choderalab / YankTools

[WIP] Creation of an abstract base class for test systems #5