choderalab / fah-xchem

Tools and infrastructure for automated compound discovery using Folding@home
MIT License
6 stars 3 forks source link

Refactored `structures` module; removed reference hardcode #148

Closed dotsdl closed 2 years ago

dotsdl commented 3 years ago

Description

Refactored structures module, and addressed #99 .

We were having to pass so many arguments through to most function calls, with duplicated docs and far too much wrangling needed to make a few parameters available to a deep function call.

Now, we've added configuration fields for specifying the reference structure, and these are passed along with some other high-level config items to SnapshotArtifactory. We can then call SnapshotArtifactory.generate_representative_snapshots with only input transformations, output directory, number of processors to use, and whether to overwrite.

Todos

Notable points that this PR has either accomplished or will accomplish.

Status

codecov-commenter commented 3 years ago

Codecov Report

Merging #148 (98afb3a) into master (084da4e) will decrease coverage by 0.51%. The diff coverage is 6.43%.

dotsdl commented 3 years ago

@jchodera thank you!

I'll have to think on that. It's not clear to me at this time if this change gets us closer to supporting different reference structures for individual transformations. How do you envision specifying each reference for each transformation via the CLI?

jchodera commented 3 years ago

I'll have to think on that. It's not clear to me at this time if this change gets us closer to supporting different reference structures for individual transformations. How do you envision specifying each reference for each transformation via the CLI?

(1) The reference structure used for alignment during dashboard analysis should be configurable in the fragalysis JSON file.

(2) The reference PDB structures used for setting up and running perses calculations will be embedded in the JSON transformations file that the dashboard consumes. The only relevant component of this is how the PDB reference topology file is selected.

I think this should be easy to fix later, provided we keep a separation between (1) and (2) as different filenames.

dotsdl commented 2 years ago

Working on unit tests for this feature. This and other parts of the library would benefit from a complete project and data dir, perhaps from the minimal NEQ sprint that we used for the analysis results?

The corresponding directories give about 3GiB worth of data though. Perhaps not too bad if we package it up in an another repo, perhaps fah-xchem-testdata?

dotsdl commented 2 years ago

@jchodera what do you think about this idea? A 3.2GiB repo, assuming Github will let me make it, won't break the choderalab org, will it?

dotsdl commented 2 years ago

@jchodera what do you think about this idea? A 3.2GiB repo, assuming Github will let me make it, won't break the choderalab org, will it?

This idea won't work: data elements too big. Attempted with fah-xchem-testdata but initial push failed:

$ git push origin master
Enumerating objects: 8090, done.
Counting objects: 100% (8090/8090), done.
Delta compression using up to 8 threads
Compressing objects: 100% (8035/8035), done.
remote: fatal: pack exceeds maximum allowed size
error: remote unpack failed: index-pack abnormal exit
To github.com:choderalab/fah-xchem-testdata.git
 ! [remote rejected] master -> master (failed)
error: failed to push some refs to 'github.com:choderalab/fah-xchem-testdata.git'