StanJulia / StanSample.jl

WIP: Wrapper package for the sample method in Stan's cmdstan executable.
MIT License
18 stars 4 forks source link

Better support for BridgeStan #69

Closed goedman closed 1 year ago

goedman commented 1 year ago

Hi Seth ( @sethaxen ),

Can you let me know if Brian's answer indeed solves this issue for you?

In your threaded approach, do you create a StanModel (for a particular chain ID) and sample that specific StanModel in each thread?

After the release I struggled with updating the example notebook (e.g. in Stan.jl in Notebook-Examples/BridgeStan/BridgeStan.jl and the test script (in StanSample.jl in test/test_bridgestan/test_bridgestan.jl). I was ok with that as an initial simple example but wasn't convinced about more serious usage.

Unfortunately I couldn't get the setup you proposed last year to work anymore. It certainly worked on my system, but CI testing did not (allow me to?) copy the stanc compiler (binary) into StanSample/deps/data.

While testing it on my system, it felt as if BridgeStan now is fairly big, pulling in a large chunk of Stan.

In the setup I use currently, I set the make flags in a file, as in cmdstan (CXX and STAN_THREADS), but I believe that can also be done when you create the StanModel. I need to go back to update that.

Anyway, please let me know if you see better ways to support the new BridgeStan.jl in StanSample.jl.

Rob

sethaxen commented 1 year ago

Can you let me know if Brian's answer indeed solves https://github.com/roualdes/bridgestan/issues/67#issue-1537691761 issue for you?

Yes, that solution seems to work for me.

Here's some working code:

I've locally tested on my system that sampling is reproducible, i.e. the threads are non-interacting.

In the setup I use currently, I set the make flags in a file, as in cmdstan (CXX and STAN_THREADS), but I believe that can also be done when you create the StanModel. I need to go back to update that.

Currently I'm just specifying the compile flag locally, but if I can do it at time of creation of the model, that would be better.

Anyway, please let me know if you see better ways to support the new BridgeStan.jl in StanSample.jl.

RE how BridgeStan is included in StanSample, it's really not ideal. e.g. right now BridgeStan specifies a struct field as const, which isn't supported on Julia versions older than v1.8. So if the user has bridgestan cloned locally and StanSample detects it within an older Julia session, StanSample fails to load. The normal way to handle this is with package version bounds, but including BridgeStan in this way makes that impossible.

Ideally BridgeStan.jl would be its own registered package. Like Stan.jl, it would require that the user clones the bridgestan repo themselves before use. Or perhaps the latest bridgestan release could be specified as an artifact. Then it could be automatically downloaded for the user. One would need to check that the tarball attached to the release correctly includes the submodules. This is the approach I took with https://github.com/sethaxen/PosteriorDB.jl

goedman commented 1 year ago

Hi Seth @.***),

Thanks for the examples.

The examples now use:

    smb = BS.StanModel(;
        stan_file = sm.output_base * ".stan",
        stanc_args=["--warn-pedantic --O1"],
            make_args=["CXX=clang++", "STAN_THREADS=true"],
        data = sm.output_base * "_data_$(chain_id).json")

Anyway, please let me know if you see better ways to support the new BridgeStan.jl in StanSample.jl.

RE how BridgeStan is included in StanSample, it's really not ideal. e.g. right now BridgeStan specifies a struct field as const, which isn't supported on Julia versions older than v1.8. So if the user has bridgestan cloned locally and StanSample detects it within an older Julia session, StanSample fails to load. The normal way to handle this is with package version bounds, but including BridgeStan in this way makes that impossible.

I’ll fix this in StanSample.jl v7.0.2 and drop support for BridgeStan if the minor version of Julia < 8.

Ideally BridgeStan.jl would be its own registered package. Like Stan.jl, it would require that the user clones the bridgestan repo themselves before use. Or perhaps the latest bridgestan release could be specified as an artifact. Then it could be automatically downloaded for the user. One would need to check that the tarball attached to the release correctly includes the submodules. This is the approach I took with https://github.com/sethaxen/PosteriorDB.jl

Yes, agreed.

Rob

sethaxen commented 1 year ago

There's an open PR now to register BridgeStan, which will be merged in 3 days: https://github.com/JuliaRegistries/General/pull/76783. BridgeStan now by default lazily downloads the bridgestan repo when first needed, has no dependencies outside the standard library, and is compatible with Julia v1.6.

goedman commented 1 year ago

@sethaxen Thanks Seth!

The last couple of days I have been using the latest code merges to the BridgeStan repo. Haven't played around with the lazy download feature.

Once merged in the registry I plan to make a few more changes to StanSample.jl to 1) check the parent directory where cmdstan has been downloaded, 2) check the ENV for a valid BRIDGESTAN value and 3) download the bridgestan repo.

That should hopefully do the trick.

sethaxen commented 1 year ago

Once merged in the registry I plan to make a few more changes to StanSample.jl to 1) check the parent directory where cmdstan has been downloaded, 2) check the ENV for a valid BRIDGESTAN value and 3) download the bridgestan repo.

One of the nice things about bridgestan's new release mechanism is BridgeStan.jl's bridgestan artifact always points to a release of the source made concurrently with that version of the package, so they cannot possibly be out of sync, which makes it preferable to downloading the source. Nevertheless, to support customization, BridgeStan.jl by default checks the BRIDGESTAN environment variable before checking the artifact.

So my suggestion is to simply document that users should set BRIDGESTAN if they really want to use a specific downloaded copy of the repo, pointing to the BridgeStan.jl docs for reference, and don't implement options 1), 2), or 3). They're not necessary, since BridgeStan handles this, and they potentially make things unstable.

goedman commented 1 year ago

Hi Seth,

Sounds like good advice!

I've updated StanSample.jl accordingly as the environment variable will cover what I might occasionally need for testing.

Will do some more testing on CI and see what's the best way to release it (which I kind of need for Pluto).