JuliaDynamics / DrWatson.jl

The perfect sidekick to your scientific inquiries
https://juliadynamics.github.io/DrWatson.jl/stable/
Other
823 stars 90 forks source link

savename and path-argument #164

Open JonasIsensee opened 4 years ago

JonasIsensee commented 4 years ago

I came across what someone might consider a bug.

julia> p = Dict(:sourcefile => "path/to/my/sourcefile")
Dict{Symbol,String} with 1 entry:
  :sourcefile => "path/to/my/sourcefile"

julia> savename(p)
"sourcefile=path/to/my/sourcefile"

julia> produce_or_load(p, p -> p)
┌ Warning: Using the standard Julia project.
└ @ DrWatson ~/.julia/packages/DrWatson/z56YI/src/project_setup.jl:30
[ Info: File sourcefile=path/to/my/sourcefile.bson does not exist. Producing it now...
┌ Warning: The directory ('/home/jonas/.julia/environments/v1.4') is not a Git repository, returning `nothing` instead of the commit ID.
└ @ DrWatson ~/.julia/packages/DrWatson/z56YI/src/saving_tools.jl:48
┌ Warning: The directory ('/home/jonas/.julia/environments/v1.4') is not a Git repository, returning `nothing` instead of a patch.
└ @ DrWatson ~/.julia/packages/DrWatson/z56YI/src/saving_tools.jl:95
[ Info: File sourcefile=path/to/my/sourcefile.bson saved.
(Dict(:sourcefile => "path/to/my/sourcefile"), "sourcefile=path/to/my/sourcefile.bson")

shell> tree
.
└── sourcefile=path
    └── to
        └── my
            └── sourcefile.bson

3 directories, 1 file

What do you think? Should we add a warning when path delimiters are part of savename or should we escape them?

Another thought: Would anyone be interested in a savename option to output a hash instead of the normal behaviour? That would help when the string becomes longer than the OS allows.

tamasgal commented 4 years ago

I guess we should in general do more checks in savename() since there are more characters which can cause problems. Especially on windows the list is quite large (<>/\?*|": etc.).

What do you mean by a hash? Something like a base64 encoding or so?

Datseris commented 4 years ago

Would anyone be interested in a savename option to output a hash instead of the normal behaviour? That would help when the string becomes longer than the OS allows.

I think this is a good idea, but probably better to do it as a separate function. savename is already so heavy...

Should we add a warning when path delimiters are part of savename or should we escape them?

How does this work on the actual name of the file? Isn't it impossible to save a file that contains / or \ in their name? At least in windows? I'd say go for the warning.

JonasIsensee commented 4 years ago

How does this work on the actual name of the file? Isn't it impossible to save a file that contains / or \ in their name? At least in windows? I'd say go for the warning.

Well somehow I ended up with this: And btw. this is also why I wanted to use some kind of hash instead.

.
├── Ithreshup100k=5_b=0.16_liftlockdowndelay=14_localdir=
│   └── data
│       └── username
│           └── epiproject
│               └── dataset_20200525
│                   ├── _model=SIRsubdivision.MeanFieldLockdownAll.Population_subdivsize=100000_timesel=complete_xcol=xi_xvals=(1=0,10=1,2=0,3=0,4=0.001,5=0.003,6=0.01,7=0.032,8=0.1,9=0.316)_ycol=lockdowndays_weightedmean.bson
│                   └── _model=SIRsubdivision.MeanFieldLockdownAll.Population_subdivsize=100000_timesel=complete_xcol=xi_ycol=lockdowndays_weightedmean.bson
├── Ithreshup100k=5_b=0.2_liftlockdowndelay=14_localdir=
│   └── data
│       └── username
│           └── epiproject
│               └── dataset_20200525
│                   ├── _model=SIRsubdivision.MeanFieldLockdownAll.Population_subdivsize=100000_timesel=complete_xcol=xi_xvals=(1=0,10=1,2=0,3=0,4=0.001,5=0.003,6=0.01,7=0.032,8=0.1,9=0.316)_ycol=lockdowndays_weightedmean.bson
│                   └── _model=SIRsubdivision.MeanFieldLockdownAll.Population_subdivsize=100000_timesel=complete_xcol=xi_ycol=lockdowndays_weightedmean.bson
└── Ithreshup=5_b=0.24_liftlockdowndelay=14_localdir=
    └── data
        └── username
            └── epiproject
                └── dataset_20200525
                    ├── _model=SIRsubdivision.MeanFieldLockdownAll.Population_timesel=complete_xcol=xi_xvals=(1=0,10=1,2=0,3=0,4=0.001,5=0.003,6=0.01,7=0.032,8=0.1,9=0.316)_ycol=lockdowndays_weightedmean.bson
                    └── _model=SIRsubdivision.MeanFieldLockdownAll.Population_timesel=complete_xcol=xi_ycol=lockdowndays_weightedmean.bson
Datseris commented 4 years ago

but do you still get same thing if you escape slashes?

JonasIsensee commented 4 years ago

I guess we should in general do more checks in savename() since there are more characters which can cause problems. Especially on windows the list is quite large (<>/?*|": etc.).

What do you mean by a hash? Something like a base64 encoding or so?

hm, I looked at base64 but my impression is that filenames would not get shorter.. Base.hash looks more promising even if it is not reversible.

but do you still get same thing if you escape slashes?

How would you like me to escape them?

Doing // does not work and \/ also errors

Datseris commented 4 years ago

or should we escape them?

You suggested that they can be escaped :P I never thought it was possible :P

Datseris commented 4 years ago

Base.hash looks more promising even if it is not reversible.

I've tested a bit and Base.hash(savename(...)) gives same hash for same input string. It is not invertible but it is deterministic, which is one of the main purposes of savename. What I wonder is whther these hashes change from Julia version to Julia version.

JonasIsensee commented 4 years ago

I don't know about that. Also, I was mostly thinking of using Base.hash(c) instead of savename(c). There is no point in still risking string rounding issues a.k.a 0.154 != 0.153 but "0.15" == "0.15" when we're not using the string as a filename anyway.

EDIT: If you are just using savename by itself, then of course you can just exchange it for Base.hash but then you can't use produce_or_load anymore. (Which was my application)

tamasgal commented 4 years ago

Btw. alternatively we can also think about a meta file, which would save the information in an external file. savename could provide a hash and a JSON file could hold the parameters. Just thinking out loud...

JonasIsensee commented 4 years ago

that might be an option.

Have a look at https://github.com/invenia/JLSO.jl ! I just found this and I think this definitely needs a shoutout in the docs. It includes a project and manifest as metadata with any file so later you can just activate a file environment.

Datseris commented 4 years ago

Wait, can't this replace BSON.jl entirely...?

sebastianpech commented 4 years ago

Well it says it uses BSON for storing the metadata, so you can't get rid of BSON entirely. However, maybe it's more stable regarding custom types. The julia serializer works very well, though the docs say it only works reliable for the same julia version. Maybe storing the state of the current install helps with that problem

sebastianpech commented 4 years ago

https://docs.julialang.org/en/v1/stdlib/Serialization/

In general, this process will not work if the reading and writing are done by different versions of Julia, or an instance of Julia with a different system image.

sebastianpech commented 4 years ago

I kinda missed that discussion, but the idea with the metadata is basically what my metadata implementation is about. Regarding hashing I found that this

https://github.com/sebastianpech/DrWatsonSim.jl/blob/820d26eaea671a798788cf360e112b316202d14c/src/Metadata.jl#L50-L51

works quite well.