Open shaoqx opened 6 months ago
@chrisjurich Any thought?
Here are my respective thoughts:
Don't really have any major thoughts on this one. This is more of an ARMer problem.
I think the issue of maintaining a large number of directories comes down to standardizing the layout of a work_dir
as we often call it in a function. Doing this would make tracking directories easy as whenever the work_dir
attribute is filled, we can have a standardized "dot" file such as <work_dir>/.log_file
or <work_dir>/.log.json
. I am pretty neutral on the format we use for the logging file storage but I think .json
is a fine choice. Below is a rough example of the data object format I would like to use:
{
"status": "", # completed, failure, etc etc
"error_message": "",
"executed_commands": "",
}
Results I am not too concerned about. They tend to be pretty easy to collect and are also pretty fast. If the proposed data object layout from the second bullet point is well designed, it will be possible to base our analysis on the state of the work_dir
based on the results in the work_dir/.log_file
file or whatever the data layout would be.
Sounds pretty good!
How do you think of the formal version of the shrapnel workflows. The challenge is say we want to model 1000 different enzyme variant how to approach it most efficiently and robustly. Here are some sub-challenges:
How to use as much resources. We need to submit jobs for each of them and btw each of the jobs will also submit their own MD or QM jobs. (My thoughts: Ask user to make a workflow script that takes each enzyme variants in json/pickle format, and the there is a CLI tool to make all the subdirs and use ARMer to submit those subdirs in a job array manner. Problem: I hope there could be a better to way to more strictly ask user to frame their workflow script. Probably use ABC?)
How to maintain the 1000 work dirs. Say there are 5 out of 1000 failed. How do you tell which one fails and how to automatically restore them. (We can make an auto checking and fixing part in the CLI tool ultimately but the case of failure is so diversed that hard to code such part. We can also make it another CLI tool that assist user to debug and resubmit or apply other operations.)
How to collect the results. This is fairly easy tho.