JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.86k stars 5.49k forks source link

Null bytes in repl_history on shared home directories #36707

Open lhupe opened 4 years ago

lhupe commented 4 years ago

I’m currently using julia on a network of Linux computers with shared home directories (via NFS). I’ve noticed that when I use REPLs on two machines simultaneously (which I do more or less every day), the repl_history.jl file gets corrupted by null bytes.

This is apparently independent of what I do during the simulatenous REPL sessions; I can reproduce the problem by immediately closing the REPLs after opening them. As soon as there are two sessions running at the same time on different hosts, the history file gets corrupted.

Example

This is an example sequence of commands to illustrate the problem, but I have found the problem to be independent of the order of closing or opening the REPLs, as soon as two are running simultaneously, the problem will occur.

On the first machine, open a REPL and run

julia> println("I have now started one REPL")

Then, open a REPL on a second machine an run

julia> println("and another one on a different machine")

Go back to the first machine and execute

julia> println("I will now close the first REPL"); exit()

thus closing the first REPL.

Now close the second REPL with

julia> println("and now the second one"); exit()

The resulting REPL history file (as displayed by vim running on the second machine) will be

# time: 2020-07-17 10:05:00 CEST
# mode: julia
        println("and another one on a different machine")
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@# time: 2020-07-17 10:05:32 CEST
# mode: julia
        println("and now the second one"); exit()

(where the ^@ represent the null bytes)

JonasIsensee commented 3 years ago

Oh, this isn't actually new.

7176

JonasIsensee commented 3 years ago

This was fixed for precompile caches. #36416 Can we do the same here?

37015

StefanKarpinski commented 3 years ago

Solutions for small, atomically generated files like precompile caches are not ideal for a few reasons. History files tend to get quite large and they are incrementally generated. Rewriting the entire history file to a temporary file and then replacing it on every REPL entry seems like not a great idea. It would also mean that instead of corruption, we would see history loss: if two Julia processes try to write a history entry at the same time, they both make a copy, add their new entry and then replace the original with their copy — whichever move happens last "wins" and the history entry from the other one is lost.

File locking would be much better: if one Julia process is currently updating the history file, let others wait until it is done. No copying of the file is required and no entries will be lost. The only problem is that file locking is hard and not portable and APIs like flock don't work on distributed file systems, which is exactly where this is a problem. The best approach is probably to make https://github.com/vtjnash/Pidfile.jl a stdlib and then use that.