Closed kyleabeauchamp closed 10 years ago
What if we called this fah-tools? Or do you really want a separate repo for "munging"?
Tools is pretty general; right now, the code is just for munging. We can change the name if the scope of the code expands in the future.
This looks pretty good! The only thing I'd ask for is more documentation in the code about what the various "munging" steps do.
What about periodic image issues?
I suppose we'll have to add that later, as AFAIK we don't have pbc whole
implemented in MDTraj. I'll look into that.
Right now, the key issue is automating the bunzip
, which currently nearly 15 seconds per WU and makes it nearly impossible to do meaningful real-time analysis / reporting...
@schwancr I just adjusted the stripping function keep the unitcell information in the protein HDF5, which should allow us to perform downstream PBC changes.
Yea that sounds like a good idea. Ideally mdtraj
will be able to do this in the future, though it's not trivial to implement.
Has anyone looked at the PBC-whole code in gromacs or ambertools? It might actually not be that complex.
-Robert
On Mon, Sep 15, 2014 at 12:15 PM, Christian Schwantes < notifications@github.com> wrote:
Yea that sounds like a good idea. Ideally mdtraj will be able to do this in the future, though it's not trivial to implement.
— Reply to this email directly or view it on GitHub https://github.com/FoldingAtHome/FAHMunge/pull/1#issuecomment-55643113.
But it doesn't work that well. They're (gromacs) recipe for doing it involves several calls of the same command-line script and even then they admit it doesn't work in all cases.
@kyleabeauchamp: what's the appropriate forum to discuss the provenance metadata storage (e.g. processed_filenames
), and the directory structure we want to encourage for FAH projects and mixtape?
I'm not sure that storing extra attributes on the HDF5 files is the best way to go -- if we really want to do that, we should consider simply adding that field to the MDTraj HDF5 format spec. We could also do something more akin to the MSMBuilder 2 design, where a separate metadata file is stored which contains the provenance info. It might be nice, also, not to irreversibly tie this data munging step to the use of HDF5 files for the output.
It would be helpful to get to some consensus on these design choices, especially as we start pushing mixtape for end users.
This is working well enough for now, we will discuss future iterations in issue #2