Switch to absolute path

Byrth commented 3 years ago

I have had a rather pernicious problem with CmdStan that I was unable to consistently replicate. I've known it has something to do with threading, but was unable to figure out how it was thread-unsafe. When I was testing stuff today, I hit upon a way to replicate it and think I figured out the problem.

cd() changes the path for all threads, which means that running multiple CmdStan.jl models on different threads within a process results in somewhat random behavior, it seems.

The particular behavior I have is:

# p1 is the number of models to fit
# old_model is an old Stanmodel
estimates = Vector(undef, p1)
Threads.@threads for i in 1:p1
    new_model= deepcopy(old_model)

    pdir = pwd()
    while ispath(pdir)
        pdir = tempname()
    end

    new_model.pdir = pdir
    new_model.tmpdir = joinpath(splitpath(pdir)...,"tmp")

    mkpath(new_model.tmpdir)

    estimates[i] = stan(new_model)

    rm(pdir; force=true, recursive=true)
end

With the current 6.0.8, this errors some of the time (p1 > 5 or so) if fitting an identical model because it is cd()ing around and ends up attempting to read files using a relative path that isn't accurate because another thread moved to its own directory.

This PR switches to using an absolute path. Line 99 of stancode.jl now grabs the absolute path of the passed model.tmpdir and uses it to make a good portion of the code (other than that line and the make section) working-directory independent. It isn't a full fix, really, but I haven't been able to make it happen with these changes. I don't think a full fix would be possible until CmdStan becomes working directory-independent.

The long and the short of it is that I added another _file field to Stanmodel that is tmpdir+name and now it is referenced in most of the places where the code used to use name.

Tests pass locally (on Linux).

goedman commented 3 years ago

Thank you @Byrth for reporting this and preparing the PR.

I'll study it over the next few days to make sure I understand this problem. I rarely work with CmdStan anymore and have switched to StanSample, but would like to see if this applies there as well.

Recently I removed pmap from StanSample as pmap can create problems on clusters but management of file paths is slightly different there.

It would definitely be surprising (to me) if cd() works across threads but your case is pretty strong!

goedman commented 3 years ago

@Byrth

A few questions.

Are you just trying to get it to run using the threads model? I definitely can reproduce below error:

An error occurred while compiling the Stan program.

Please check your Stan program in variable 'bernoulli' and the contents of /var/folders/l7/pr04h0650q5dvqttnvs8s2c00000gn/T/jl_RPuyW7/tmp/bernoulli_build.log.
Note that Stan does not handle blanks in path names.
ERROR: LoadError: TaskFailedException:
Return code = -3

Or is your application more targeting a few models each with different observational data?

Rob

goedman commented 3 years ago

Just to make sure we're testing the same MWE, I see the following error:

make: *** No rule to make target `/var/folders/l7/pr04h0650q5dvqttnvs8s2c00000gn/T/jl_c6P1gJ/tmp/bernoulli'.  Stop.

using the following MWE:

using CmdStan, Distributed

ProjDir = @__DIR__
cd(ProjDir)

bernoullimodel = "
data { 
  int<lower=1> N; 
  int<lower=0,upper=1> y[N];
} 
parameters {
  real<lower=0,upper=1> theta;
} 
model {
  theta ~ beta(1,1);
  y ~ bernoulli(theta);
}
";

observeddata = Dict("N" => 10, "y" => [0, 1, 0, 1, 0, 0, 0, 0, 0, 1])

tmpdir = ProjDir * "/tmp"
sm = Stanmodel(name="bernoulli", model=bernoullimodel;
  #tmpdir=tmpdir
);

println("\nThreads loop\n")
p1 = 1
# p1 is the number of models to fit
# old_model is an old Stanmodel
estimates = Vector(undef, p1)
Threads.@threads for i in 1:p1
    new_model= deepcopy(sm)
    println(new_model.tmpdir)

    pdir = pwd()
    while ispath(pdir)
        pdir = tempname()
    end

    new_model.pdir = pdir
    new_model.tmpdir = joinpath(splitpath(pdir)...,"tmp")

    mkpath(new_model.tmpdir)

    estimates[i] = stan(new_model)

    rm(pdir; force=true, recursive=true)
end

This fails even with p1=1

goedman commented 3 years ago

@Byrth

Without your PR, on my system

using CmdStan, Distributed

ProjDir = @__DIR__
cd(ProjDir)

bernoullimodel = "
data { 
  int<lower=1> N; 
  int<lower=0,upper=1> y[N];
} 
parameters {
  real<lower=0,upper=1> theta;
} 
model {
  theta ~ beta(1,1);
  y ~ bernoulli(theta);
}
";

observeddata = Dict("N" => 10, "y" => [0, 1, 0, 1, 0, 0, 0, 0, 0, 1])

sm = Stanmodel(name="bernoulli", model=bernoullimodel, output_format=:namedtuple);

println("\nThreads loop\n")
p1 = 15
# p1 is the number of models to fit
estimates = Vector(undef, p1)
Threads.@threads for i in 1:p1
    new_model= deepcopy(sm)

    pdir = pwd()
    while ispath(pdir)
        pdir = tempname()
    end

    new_model.pdir = pdir
    new_model.tmpdir = joinpath(splitpath(pdir)...,"tmp")

    mkpath(new_model.tmpdir)

    #estimates[i] = stan(new_model)
    rc, samples, cnames = stan(sm, observeddata, new_model.pdir);
    if rc == 0
      estimates[i] = samples
    end

    rm(pdir; force=true, recursive=true)
end

works fine. I pass in the correct dir in the call to stan().

goedman commented 3 years ago

At this point, unfortunately I am not really convinced about this approach. Duplicating Stanmodels and updating some of the paths might not fit well with how CmdStan.jl was originally designed (~8 years ago). I've merged your changes in a branch test_threads but your partial MWE still fails.

Of course, I'm not quite sure what you are trying to achieve, what your models looks like, how you run julia (julia -p auto?), do you set the number of threads using an ENV var, etc. My experience with the current setup is that either you create the Stanmodels in different temp directories or the models are named differently. I added 2 examples of slightly cleaner setups to examples/Cluster (particularly cluster6.jl) that support a few models each with multiple observed data.

I believe currently pmap within pmap is not save in Julia 1.5, I had to revert that in StanSample.jl if the total number of threads (jobs * chains) exceeded the number of cores/processors.

I do agree that maybe a redesign of the Julia cmdstan interface using threads is in order but that's currently not high on my priority list.

It is interesting though that over the last year or so Stan has officially released cmdstanr and cmdstanpy, which goes to some extent against making CmdStan directory independent.

Byrth commented 3 years ago

Sorry for the delay.

Use case: I am running CmdStan as part of a modeling process that measures out of sample predictive accuracy for multiple modeling strategies. To this end, I partition data and fit multiple copies of the same model, which is how I ended up using Threads.@threads with CmdStan.

An example of the cd-ing problem can be seen using:

println("hither : ",Threads.threadid()," : ",pwd()) # hither : 1 : /tmp
Threads.@spawn cd( () -> ( println("tither : ",Threads.threadid(), " : ", pwd()); sleep(3)), "..") # tither : 2 : /
sleep(1);println("hither : ",Threads.threadid(), " : ", pwd());sleep(3) # hither : 1 : /
println("hither : ",Threads.threadid()," : ",pwd()) # hither : 1 : /tmp

So cd()-ing on other threads moves all threads around. In the original code, there were 3 cd blocks:

Cleaning/Initialization of directories - Not likely to be a problem, very fast
Compilation - Not likely to be a problem either because all threads hop into the same directory (CMD_STAN_HOME) and only need to stay there until make is called, which is followed by a long compiling pause so none hop away when entry is synchronized
Sampling / reading samples - This is where I have problems, because thread 1 would finish sampling and hop to its directory, then thread 2, then thread 3, etc. and they each assume that their local working directory is still accurate when reading in samples when it frequently is not, which results in no samples being read and thus errors.

Using an adaptation of your example and JULIA_NUM_THREADS of 8, I get errors:

using CmdStan#, Distributed
using StatsBase: sample

ProjDir = "/root"

bernoullimodel = "
data { 
  int<lower=1> N; 
  int<lower=0,upper=1> y[N];
} 
parameters {
  real<lower=0,upper=1> theta;
} 
model {
  theta ~ beta(1,1);
  y ~ bernoulli(theta);
}
";
n = 10;
observeddata = Dict("N" => n, "y" => sample([0,1],n))

sm = Stanmodel(name="bernoulli", model=bernoullimodel, output_format=:namedtuple);

println("\nThreads loop\n")
p1 = 15 # p1 is the number of models to fit
estimates = Vector(undef, p1)
Threads.@threads for i in 1:p1
    pdir = pwd()
    while ispath(pdir)
        pdir = tempname()
    end

    new_model= deepcopy(sm)
    new_model.pdir = pdir
    new_model.tmpdir = joinpath(splitpath(pdir)...,"tmp")

    mkpath(new_model.tmpdir)

    CmdStan.update_model_file(joinpath(new_model.tmpdir, "$(new_model.name).stan"), strip(new_model.model))

    rc, samples, cnames = stan(new_model, observeddata, new_model.pdir);
    if rc == 0
      estimates[i] = samples
    end

    rm(pdir; force=true, recursive=true)
end

ERROR: TaskFailedException:
Return code = -5

I anticipated needing to add more data to see errors, which is why I included StatsBase, but it ended up being irrelevant (I hope).

With the test-threads branch, it passes.

goedman commented 3 years ago

Hmmm, on MacOS (9 cores) I see very different behavior using the test_threads branch.

I see 3 problems:

-------- 1 ----------

With p1 == 1 all is fine.

The first problem shows with p1==2, very often I get:

julia> include("/Users/rob/.julia/dev/CmdStan/test/test_threads.jl");

File /Users/rob/.julia/dev/CmdStan/test/tmp/bernoulli.stan will be updated.

Threads loop

/var/folders/l7/pr04h0650q5dvqttnvs8s2c00000gn/T/jl_bFNxS3
/var/folders/l7/pr04h0650q5dvqttnvs8s2c00000gn/T/jl_5LbFaL

File /var/folders/l7/pr04h0650q5dvqttnvs8s2c00000gn/T/jl_5LbFaL/tmp/bernoulli.stan will be updated.

File /var/folders/l7/pr04h0650q5dvqttnvs8s2c00000gn/T/jl_5LbFaL/tmp/bernoulli.stan will be updated.

ERROR: LoadError: TaskFailedException:
ArgumentError: cannot parse "0.-8.73949" as Float64
Stacktrace:
 [1] _parse_failure(::Type{T} where T, ::SubString{String}, ::Int64, ::Int64) at ./parse.jl:370 (repeats 2 times)
 [2] #tryparse_internal#364 at ./parse.jl:366 [inlined]
 [3] tryparse_internal at ./parse.jl:364 [inlined]
 [4] #parse#365 at ./parse.jl:376 [inlined]
 [5] parse at ./parse.jl:376 [inlined]
 [6] _broadcast_getindex_evalf at ./broadcast.jl:648 [inlined]
 [7] _broadcast_getindex at ./broadcast.jl:631 [inlined]
 [8] getindex at ./broadcast.jl:575 [inlined]
 [9] macro expansion at ./broadcast.jl:932 [inlined]
 [10] macro expansion at ./simdloop.jl:77 [inlined]
 [11] copyto! at ./broadcast.jl:931 [inlined]
 [12] copyto! at ./broadcast.jl:886 [inlined]
 [13] copy at ./broadcast.jl:862 [inlined]
 [14] materialize at ./broadcast.jl:837 [inlined]
 [15] read_samples(::Stanmodel, ::Bool, ::Bool) at /Users/rob/.julia/packages/CmdStan/c7vKK/src/utilities/read_samples.jl:94
 [16] read_samples at /Users/rob/.julia/packages/CmdStan/c7vKK/src/utilities/read_samples.jl:22 [inlined]

Parse errors like that I've seen happening often when a thread is reading and another is writing to the same file. Higher values of p2 hardly ever pass, I've seen p1==4 pass once or twice.

-------- 2 ---------------

The second problem is that I (mostly) need to disable summary generation by cmdstan in the call to stan():

    CmdStan.update_model_file(joinpath(new_model.tmpdir, "$(new_model.name).stan"), strip(new_model.model))

    rc, samples, cnames = stan(new_model, observeddata, new_model.pdir;
      summary=false
    );

    if rc == 0
      estimates[i] = samples.theta
    end

    rm(pdir; force=true, recursive=true)

I think this is probably due to the stansummary pipeline also needs fixed paths. Likely a fourth point where cd-ing is happening. If I happen to get the stansummary, the results look identical.

-------- 3 ---------------

The third problem in the few cases it works with very low p1 values, the samples.theta results are identical:

julia> include("/Users/rob/.julia/dev/CmdStan/test/test_threads.jl");

File /Users/rob/.julia/dev/CmdStan/test/tmp/bernoulli.stan will be updated.

Threads loop

/var/folders/l7/pr04h0650q5dvqttnvs8s2c00000gn/T/jl_snIstU
/var/folders/l7/pr04h0650q5dvqttnvs8s2c00000gn/T/jl_gZ81eE

File /var/folders/l7/pr04h0650q5dvqttnvs8s2c00000gn/T/jl_gZ81eE/tmp/bernoulli.stan will be updated.

File /var/folders/l7/pr04h0650q5dvqttnvs8s2c00000gn/T/jl_gZ81eE/tmp/bernoulli.stan will be updated.

2-element Vector{Any}:
 [0.597624 0.807509 0.800475 0.75028; 0.724324 0.846703 0.776478 0.855542; … ; 0.55596 0.713048 0.695281 0.803884; 0.621433 0.498778 0.695281 0.625117]
 [0.597624 0.807509 0.800475 0.75028; 0.724324 0.846703 0.776478 0.855542; … ; 0.55596 0.713048 0.695281 0.803884; 0.621433 0.498778 0.695281 0.625117]

With Threads.nthreads() == 1 all works fine (sequential of course). Running julia -p auto with that setting also never use more than 1 thread and works fine.

In the end I started to play around with separating stanc compilation and sampling, to no avail:

using CmdStan
#using Distributed
#using Statistics
using StatsBase: sample

bernoullimodel = "
data { 
  int<lower=1> N; 
  int<lower=0,upper=1> y[N];
} 
parameters {
  real<lower=0,upper=1> theta;
} 
model {
  theta ~ beta(1,1);
  y ~ bernoulli(theta);
}
";
n = 10;
observeddata = Dict("N" => n, "y" => sample([0,1],n))

# Keep tmpdir local for easy access.
ProjDir = @__DIR__
cd(ProjDir)
isdir(tmpdir) && rm(tmpdir; force=true, recursive=true)
tmpdir=ProjDir * "/tmp"
!isdir(tmpdir) && mkdir(tmpdir)

println("\nThreads loop\n")
p1 = 4 # p1 is the number of models to fit

new_model = Vector{Stanmodel}(undef, p1)
for i in 1:p1
  newtmpdir = tmpdir * "/$i"
  !isdir(newtmpdir) && mkdir(newtmpdir)
  new_model[i] = Stanmodel(name="bernoulli_$i", model=bernoullimodel;
    tmpdir=newtmpdir, output_format=:namedtuple);
end

estimates = Vector(undef, p1)
Threads.@threads for i in 1:p1

  rc, samples, cnames = stan(new_model[i], observeddata);

  if rc == 0
    estimates[i] = samples.theta
  end

end

estimates |> display

If branch test_threads works on your machine that is a possibility. I would adapt examples/Cluster/cluster6.jl I guess.

Byrth commented 3 years ago

I think that I accidentally deleted my branch or something when it was merged (darned cleaning habits). Those are all errors that I see without my changes. Perhaps it reverted to #master for you? For me it kept trying to use my Julia cache because the branch didn't exist anymore or something.

--1-- is caused by two processes logging to the same files, as you indicate.

--2-- is caused by the summary stuff, but I am fairly sure I got them all

--3-- is caused by the directory switching two threads into the same directory before they start reading, which results in them reading the same samples. It should be the samples of the last spawned process.

These are all cd collisions in the sampling cd. I can take a look at the example later tonight.

goedman commented 3 years ago

Yes, when you merge Github closes the PR. You'll find it under closed PR. Or you can select the test_thread branch of CmdStan.jl on Github.

I pretty much I had the same thought, but I compared the test_thread branch I'm using with the PR. I'll check it again!

goedman commented 3 years ago

BREAKTROUGH! I might have missed the ] up command after selecting the test_threads branch. Weird part is it did do a recompile after restarting Julia. Let me test some more but this looks a lot better!

Byrth commented 3 years ago

Lovely! I spent a solid hour this morning in a similar situation debugging #master. Not sure what happened.

goedman commented 3 years ago

Every time a question such as this comes up I feel like chasing my tail. But you always learn something new.

On my system, the current test_threads branch on Github works fine with test_threads.jl and the examples/Threads/threads_01.jl after a restart of Julia. If I rerun the scripts they will fail.

One file that seems to pop up (or not?) at random spots is out.csv, the output produced by the stansummary executable. With summary=false commented out, it seems to run slightly better, but still fails on 2nd or 3rd repeat and out.csv is not in tmpdir.

Somehow when it fails on reruns, the first tmpdir created never seems to contain data files etc. I can't pin it down, vaguely it reminds me of the problems I saw when Julia 1.3 was released with the new thread model. But I test on 1.5 and 1.6-DEV.

On your system, can you run either of these scripts multiple times without a Julia restart?

Byrth commented 3 years ago

Running include("test_threads.jl") once on the test-threads branch in a fresh session (from CmdStan.jl/test) does not throw errors, but the second time throws an error because it has moved my working directory to CMD_STAN_HOME and not moved it back for some reason. If I cd() my way back to CmdStan.jl/test, I can run it again successfully but am again left in the CMD_STAN_HOME directory. threads_01.jl has the same behavior.

I don't understand why that would be but can look around for cd() commands later tonight if work does not require me.

It occurs to me that the only thing from the cd(CmdStanDir) block that really needs to be executed in the different wd is the run(pipeline()) command. It must be the slowest command by far in the block, so it doesn't really matter but is an idea anyway.

I am doing this in a CentOS7-based docker container.

goedman commented 3 years ago

Yes, today I did spend several hours again testing and testing. The current version of threads_02.jl sometimes behaves ok, but not reliable so. I also think in the test_threads branch models are always compiled.

I think the choices I made when to compile, sample and reading the .csv files etc. in CmdStan.jl were so-so.

StanSample.jl behaves much better (the equivalent setup is inExamples_Threads/threads_01.jl of Stan.jl which is very close to the updated threads_02.jl in test_threads).

Byrth commented 3 years ago

Tonight I tried more and was able to replicate your finding (unreliable completion, more likely on iteration 2+ than on the first run). I found one thing that I overlooked initially, which is tmpmodelname. It still depends on model.tmpdir (relative path) instead of switching to path_prefix (absolute path).

However, even after changing that and scoping down the cd() I was still having problems. I think my final conclusion here is just that libuv isn't up to the task, at least partially supported by this error:

ERROR: TaskFailedException:
IOError: unlink: no such file or directory (ENOENT)
Stacktrace:
 [1] uv_error at ./libuv.jl:97 [inlined]
 [2] unlink(::String) at ./file.jl:888
 [3] rm(::String; force::Bool, recursive::Bool) at ./file.jl:268
 [4] rm at ./file.jl:260 [inlined]
 [5] stan(::Stanmodel, ::Dict{String,Any}, ::String; init::Type{T} where T, summary::Bool, diagnostics::Bool, CmdStanDir::String, file_run_log::Bool, file_make_log::Bool) at /root/CmdStan.jl/src/main/stancode.jl:103
 [6] stan(::Stanmodel, ::Dict{String,Any}, ::String) at /root/CmdStan.jl/src/main/stancode.jl:86
 [7] macro expansion at /root/CmdStan.jl/test/test_threads.jl:48 [inlined]
 [8] (::var"#171#threadsfor_fun#16"{UnitRange{Int64}})(::Bool) at ./threadingconstructs.jl:81
 [9] (::var"#171#threadsfor_fun#16"{UnitRange{Int64}})() at ./threadingconstructs.jl:48

which references this line in [5]:

isfile("$(path_prefix)_make.log") && rm("$(path_prefix)_make.log")

path_prefix at this point is an absolute path. There's no real justification for that.

I am beginning to believe that this is just a limitation of Julia's filesystem integration using libuv. Even if I start the fit from CMDSTAN_HOME so there is no explicit directory movement at all, I still got the above error.

libuv is generally not thread safe, but the Julia team seems to have overcome that at least to the extent that they avoid deadlocks. Perhaps they stopped short of making sure the behavior is consistent, though.

Byrth commented 3 years ago

ok, I cannot replicate this with base Julia so I retract my accusation but still have no explanation for the above error.

I replaced the run(`make... command with cp("existing compiled bernoulli", "newtarget") leaving the cd() wrapper alone and still got errors.

Then I replaced the absolute_path = cd(pwd,model.tmpdir) calls with a more verbose version that doesn't rely on cd and may be platform dependent (both in stancode.jl and stan_summary.jl):

  absolute_tempdir_path = if model.tmpdir[1] == '/'
    model.tmpdir
  else
    joinpath(splitpath(pwd)...,splitpath(model.tmpdir))
  end

No errors since then with >10 include("test_threads.jl") calls.

goedman commented 3 years ago

But, unless this exercise is just for learning, always useful, your problem, if I understand it correctly, is not difficult to program (not using threads).Either the cluster example or switching to StanSample would work.

From my point of view CmdStan.jl is slowly falling behind. The CmdStan cluster example and projects don’t smoothly integrate.

Byrth commented 3 years ago

Yes. I believe this might be more trouble than it is worth, particularly with CmdStan.jl on the path to deprecation.

I am currently in a bit of a time crunch in life, but I will try to switch to the other StanJulia ecosystem when I can make time for it and until then I just won't do multithreaded model fits.

Thanks for entertaining this idea!

goedman commented 3 years ago

Indeed, your last proposed change for absolute_tempdir_path seems to have solved it!

Over the next few weeks I'll look into what's happening with out.txt and why the stan language program is marked as being updated.

Until then I'll leave the test_threads branch as is.

StanJulia / CmdStan.jl

Switch to absolute path #100