Open sebastianpech opened 4 years ago
You mentioned a central database and attaching metadata to files/ folders.
I suppose one could make it an option (or manually) to write metadata in a simple format such as .json
along with the files. (Same name or just in the same folder if every simulation gets its own folder)
Then one could probably make collect_results
read just the metadata files i.e. by restricting the suffix to .json
. That would create a *central database` that could hold whatever you want it to. (Limited by what you care to put into the metadata files)
Hi Sebastian, thanks for this post. I'm starting with DrWatson and I think your suggestion here of using @with_kw struct will suit my project better than using Dict. One question I have though is: how easy it is to adapt the standard workflow in DrWatson using Dicts to structs? In particular, I am running many simulations in parallel using pmap, and I found the feature of creating a Dict of arrays (arrays for those parameters that I am varying over simulations) and then the DrWatson function dict_list() very convenient. What would be the best way to set up sth similar but with struct? Thanks!
I think you should open up a new Issue, as a feature request, that asks for the equivalent of dict_list
function for structs
. I believe it will be something really easy to do!
EDIT: Just do what Jonas said, much smarter :P
Hi @mbruna ,
if you want to pass struct
s to your simulations as the parameters, I think it could be easiest to do this:
julia> using DrWatson
julia> Base.@kwdef struct MyParams
α::Float64 = 1.0
β::Int = 42
end
MyParams
julia> params = dict_list(Dict(
:α => [2.0, 3.0],
:β => @onlyif(:α == 3, 4)
)) .|> (p->MyParams(; pairs(p)...))
2-element Vector{MyParams}:
MyParams(2.0, 42)
MyParams(3.0, 4)
So essentially you use everything as normal and only this bit at the end
.|> (p->MyParams(; pairs(p)...))
creates a struct by passing the dictionary fields to the constructor.
Hi @mbruna, so the post is a bit outdated and I switched to a better approach that uses dicts only for the application of dict_list. I created a separate package (https://github.com/sebastianpech/DrWatsonSim.jl) for those long-running simulations. At the moment it is in parts pretty much tailored to how I run my simulations, however, it can easily be adapted and generalized. So you can look into that.
With regards to passing structs, I'm basically doing it like @JonasIsensee suggested.
Thanks @JonasIsensee, @sebastianpech, @Datseris for your prompt replies. That's exactly what I was after, thanks.
So as we are currently discussing various DrWatson workflows, I decided to explain mine for long-running simulations.
Some comments in advances
The problem
Currently I'm running simulations on a remote machine (not a cluster), which I also use for development, so everything happens in an ssh session. This means, when I start a job eg. by
julia run.jl SENB101
, whereSENB101
defines the parameter set to be used for the simulation, I need to keep the session open to keep the job running.Therefore, to have a persistent running session (also in parallel) I use tmux on the remote system for spawning jobs. Effectively, this boils down to creating a new window in an existing tmux session running the above command. I can also start multiple simulations in parallel like so:
run.sh
creates a new tmux sessionSENB
, loops over the names defined bySENB{206..209}
and creates a new window for each of them, running the simulation process.During the simulation multiple files are created. (This is just an overview of the ones important for the discussion). Simulations usually run for ~ 12 h.
folder
folder
Parameter definition
I use
Parameters.jl
for defining the simulation parameters, one definition looks eg. like thisThe main advantages for me, over using
Base.@kwdef
are@assert
call and@pack
and@unpack
. Given one of the latest PRs (https://github.com/JuliaDynamics/DrWatson.jl/pull/148) this is not so important anymore.The main advantages for me, over using
Dict
s areSENB
. Meaning I have functionsrun(p::SENB)
,run(p::DCB)
, ... which start the simulation with the given parameterp
but do different steps during initialization eg. specific definition of boundary conditions.All parameters that I use in the simulation are stored in a dict in the file
scripts/config.jl
like so:This file in included in every script with
include(scriptsdir("config.jl"))
.Tying parameter configurations to the folder structure
DrWatson's
savename
allows to create a name from a parameter configuration likeSENB_Gc1=1_Gc2=0.09_Gc3=0.36_ft1=60_ft2=4.2_ft3=4.5_inp=Moura2008-Fine_mat=Moura2008_normalize_penalty=false_p=2_pmodel=T_threshold_intensity=50_α=5_β=2_ξ=2
I use this name for naming the above
folder
,log
-file andpvd
-file. This way every file is related to the parameter configuration, either by its name or the folder it is stored in. In Finder this looks like this:For analysing the results by code, I don't really mind the folder structure. I can create paths from the parameter configuration. However, finding the one file with
Gc1=1
andpmodel=T
is though. So for this situation it's much easier to have a folder structure likeTherefore, I always prepend the parameter set identifier to the output directory, so the command using
savename
looks eg like thisThis way I can easily find the one file I'm looking for, while still knowing which parameter configuration created it. Given the discussion in https://github.com/JuliaDynamics/DrWatson.jl/issues/151, the folder structure could be a lot cleaner by storing the metadata in a central database instead of the filename.
Features I can't use
Because I have no single result file, I can't use
produce_or_load
collect_results
Features I can use
datadir
,scriptsdir
@quickactivate :Projectname
allowing loading of default configurationsFeatures that I would like to have
just dumping them here, not thought through