Open stillyslalom opened 2 years ago
That sound very familiar to me. You might want to take a look at https://github.com/sebastianpech/DrWatsonSim.jl. It s kind of a spin-off of DrWatson and supports storing metadata, which I use instead of savename. Besides that I guess it's almost impossible to find a project independent approach for capturing your whole workflow.
What I do is exporting all parameter configurations for all my simulations to a note taking app and add additional annotations and documentation there. This works quite well and can mostly be automated. As DrWatsonSim works with simulation IDs instead of unique savenames, I can always find my notes and simulation results by this id.
@stillyslalom can you try this out and let us know how it works for you? @sebastianpech I suggest that we start with writing a "real world example" in the DrWatson docs that showcases (briefly of course) your workflow. Based on that perhaps we can think of integrating DrWatsonSim directly into DrWatson if possible? Seems like many other people have asked for something similar and it always goes back to something like your approach...
@Datseris Yes, seems so. I'll open a PR and polish the code a bit. I think I kept it general enough to integrate it in DrWatson. And it's all opt-in features, so no existing workflows will break and the metadata directory is only created when a metadata-related function is called the first time, so the folder structure also remains unchanged.
Oh and @Datseris which file format do we prefer. The version on the master branch uses BSON for storing metadata, but I also have a separate branch using JLD2. Haven't merged it yet as I didn't want to update all my old project folders, but I guess as we have JLD2 as a dependency in DrWatson, I will do the switch.
Yeah there is a clear statement in DrWatson now for JLD2 preference since 2.0. Do note that JLD2 cannot save functions though. Do you need this?
Aha! I remember why I didn't do the switch yet. Yes I store all kinds of stuff in the bson files. However, it's not a requirement for the metadata functions to work. I think it's easy to workaround this. I will add a note to the docs.
We have a similar difficulty in Agents.jl where we can't store functions-part of the model, only parameters that are not functions. However, JLD2 can store function-like objects with e.g., singleton dispatch.
struct Object end
(o::Object)(x, y) = x+y
instances of Object
can be stored and when loaded will behave like functions.
Interesting syntax. Never seen it. Cool. Good to know.
Isn't it possible to make the system backend independent? Just choose via a keyword or environment variable which save backend to use?
Sure, that's a good option.
We have a similar difficulty in Agents.jl where we can't store functions-part of the model, only parameters that are not functions. However, JLD2 can store function-like objects with e.g., singleton dispatch.
struct Object end (o::Object)(x, y) = x+y
instances of
Object
can be stored and when loaded will behave like functions.
Hi, quick note:
This only works, when the Object
and its method is defined in the new session already.
I just tested this
foo(x) = x^2
save("test.jld2", Dict("fun"=>foo))
load("test.jld2")["fun"](2) # Gives 4
and it works. @Datseris what did you mean by storing functions? Do you mean actually storing them without the need to define them in the file you are loading them again?
I just ran through 1000+ metadata files I stored and I can convert all of them from BSON to JLD2
@JonasIsensee please comment here, as far as I know JLD2 cannot save functions. (Or, to put it in a better way: it is advised to not save functions with it. By whom, I actually do not remember anymore)
I just ran through 1000+ metadata files I stored and I can convert all of them from BSON to JLD2
Is it possible to use toml
or yml
files for storing metadata?
I used to do my simulation work by storing the configurations, the metadata, and the results (which by analyzing the simulation data on the fly) in separate yml
files prior to knowing DrWatson. I really like the produce_and_load
and collect_results
functions. So I want to give DrWatson a try:)
@liuyxpp to me it feels like using Yml would lead to artificial limitations. I mean, why? Why use this format instead of a native Julia format? In your use case it might be that every metadata that you save is either a number or a string, but why not have the possibility to save arbitrary Julia types as metadata? That's exactly why one should go with JLD2 or BSON.
Yes, only strings and numbers are stored. I use yml
partly because sometimes I want to check the files by eye. The other reason is my simulation work involves several programs and some of them are written in C++. But I am working on rewriting them in Julia.
I get your point now and the native Julia format is OK for me once I have done the transition of my simulation programs.
@liuyxpp I get your point. I usually additionally convert the parameters dict into a string representation and store this separately to quickly see the parameters I used for each simulation. You could, for example, auto-generate this string representation and store it in an additional file in each simulation directory:
if in_simulation_mode()
m = Metadata(simdir()) # load metadata for this simulation
# store m["parameters"] somehow in simdir()
end
simdir
works just like the directory function in DrWatson. If you are in an active simulation simdir("params.toml")
resolves to [absolute path to simulation directory]/params.toml
I now realize this is useful for me as well. I'm saving my simulations directly into a master dataframe. But now I want to make a new column and everything breaks down. Then I realized we have this super awesome collect_resulst
function. But my input data are simply too large and complex for me to use savename
to uniquely extract a file name for each simulation. So this is where the hashing of DrWatsonSim would be very useful! @sebastianpech did you have any progress getting this PR started?
@Datseris I'm a little bit low on time at the moment. I'll try to get it started over the weekend.
no stress!
Bumping to ask--did this ever get implemented into DrWatson
proper? This sounds very useful to me. I can also try using DrWatsonSim
on its own, of course, which I may do in the meantime.
In PR https://github.com/JuliaDynamics/DrWatson.jl/pull/366 I am providing an intermediate solution that uses hash
on the given configuration container passed to produce_or_load
, to provide a unique string for more complicated input configurations.
I've been using DrWatson.jl to organize my preprocessing code/analysis for an ensemble of experiments performed in a large, complex facility. I've struggled to find an ergonomic workflow that captures the entire pipeline. Issues include:
savename
format without exceeding filesystem length limitsproduce_or_load
data-processing pipeline that assumes hands-off, end-to-end Julia codeThis issue is partly a reminder to myself to write suitable documentation once I arrive at a good workflow.