choderalab / fahmunge

Tools for Munging Folding@Home datasets
MIT License
4 stars 6 forks source link

Folder Structure for Source and Munged under siegetank #6

Open jhprinz opened 10 years ago

jhprinz commented 10 years ago

I got a little lost in the typical FAH hierarchical structure/project organization: So that is the default or preferred folder structure for fah and siegetank projects?

In the siegetank API tutorial to sync folder structure is:

target_folder/
    <stream0.id>_data/
        <part_0.frame>/
            frames.xtc
        <part_1.frame>/
        <part_2.frame>/
        ...
    <stream1.id>_data/
    <stream2.id>_data/
    ...

On top of this I fould the idea of RUNS which seem to make sense for FAH but since siegetank is very flexible maybe the meaning has changed. I assume that either

or

So, I propose for the siegetank-synced (unprocessed) folder structure

<target.short_name>_<target.id>_/
    RUNS/
        RUN0_<run0.short_name>/
            STREAM0_<stream0.id>/
                <part_0.first_frame>/
                    frames.xtc
                    ...
                <part_1.first_frame>/
                <part_2.first_frame>/
                ...
            STREAM1_<stream1.id>/
            STREAM2_<stream2.id>/
    ...

where target.short_name represents the project name as in FAH projects. This way it is similar to the FAH order RUN##/CLONE##/ but contains useful extra information like the stream UUID

Then for the munged folder in /data/choderalab/fah/munged/

<target.id>_<target.short_name>/
    all-atoms/
        run0-stream0_<stream0.id>.h5
        run0-stream1_<stream1.id>.h5
        ...
        run1-stream0_<stream0.id>.h5
        ...
    no-solvent/
        run0-stream0_<stream0.id>.h5
        run0-stream1_<stream1.id>.h5
        ...
        run1-stream0_<stream0.id>.h5
        ...

We can also

kyleabeauchamp commented 10 years ago

I think:

Project = Target

Each pair of (run, clone) is a single stream. I don't think will be an automatic staging process on ST.

Some of these questions cannot be fully resolved until ST implements more features from FAH (E.g. points), as that will play a role in how things are set up and organized.

jhprinz commented 10 years ago

I agree:

Target should be a Project and it already contains the basic information like a description. Then simulations / stream are attached which are not organized in any way. This means, we can (for now) impose one without interference. The problem is that internally this might become a little messy like having all files for RUNS/CLONES in one folder.

I see that this might change if the organization of ST changes.

So, for now I would keep the RUN / STREAM ordering.

What is the actual idea of RUNS in FAH? Where these meant for several iterations or for the test phase, etc?

kyleabeauchamp commented 10 years ago

In FAH, RUNS correspond to different starting conformations. CLONES refer to different velocities.

kyleabeauchamp commented 10 years ago

Also, in FAH, one is generally supposed to ensure that the different RUNS have the same number of atoms / topology / etc. Otherwise, the points will vary between the RUNS.

jhprinz commented 10 years ago

Luckily we do not have these restrictions. All streams can be totally different which means we have to be more careful staying organized.

What are points in FAH?

kyleabeauchamp commented 10 years ago

Whenever possible, we may still want to enforce these restrictions, because consistency with FAH is important.

FAH workunits award points to donors. It is the currency for doing our computations.

jhprinz commented 10 years ago

Okay "points", I thought about point like checkpoints...

Wasn't the idea of siegetank to be more flexible? It seemed quite useful, but we don't want to break compatibility. That would make more harm than good...

kyleabeauchamp commented 10 years ago

My point is that eventually siegetank is going to be plugged into FAH, so we need to adopt procedures that will be compatible with FAH operation.

jhprinz commented 10 years ago

Okay, then we should really wait, once there are more features in ST. For now I will start building something that we can use and adapt later. Changes should be easily made.

kyleabeauchamp commented 10 years ago

I agree. I was just saying that we should avoid creating excessive heterogeneity within different streams of a single target, as that's "allowed but undesirable" within the current ST API.

All I'm saying is don't use a single target to simulate both HP35 and src kinase, as that may cause issues down the road.

VijayPande commented 10 years ago

Several is plugged into fah via the latest client

Thanks,

Vijay

Sent from my phone. Sorry for any brevity or unusual tone.

On Oct 5, 2014, at 3:14 PM, kyleabeauchamp notifications@github.com wrote:

My point is that eventually siegetank is going to be plugged into FAH, so we need to adopt procedures that will be compatible with FAH operation.

— Reply to this email directly or view it on GitHub.

jchodera commented 10 years ago

Oh! Is the latest client being rolled out already?

I think everyone in the lab is excited for how much easier it is to programmatically set up and manage ST jobs.

VijayPande commented 10 years ago

PS THe latest client is under testing still. We can push on Joe and Yutong on that one to push it out.

Thanks, Vijay

Sent from my Phone. Sorry for the brevity or unusual tone.

On Oct 5, 2014, at 3:29 PM, Vijay S. Pande pande@stanford.edu wrote:

Several is plugged into fah via the latest client

Thanks,

Vijay

Sent from my phone. Sorry for any brevity or unusual tone.

On Oct 5, 2014, at 3:14 PM, kyleabeauchamp notifications@github.com wrote:

My point is that eventually siegetank is going to be plugged into FAH, so we need to adopt procedures that will be compatible with FAH operation.

— Reply to this email directly or view it on GitHub.