Docs are in dire need of update

TuringLang / docs

Documentation and tutorials for the Turing language

https://turinglang.org/docs/tutorials/docs-00-getting-started/

MIT License

225 stars 97 forks source link

Docs are in dire need of update #484

Open torfjelde opened 1 year ago

torfjelde commented 1 year ago

The documentation, in particular the one in Turing.jl itself, is in dire need of update given the amount of features and improvements we've made over the past year. In particular, the tutorials have lots and lots room for improvement.

A few things that come to mind immediately are the foollowing.

User-facing side:

[ ] Turing.predict for predicting based on a given chain.
[ ] DynamicPPL.generated_quantities, similar to Stan's generated-block, which allows you to, effectively, capture the return-values of the model (i.e. the stuff in return ...) conditioned on a chain.
[ ] condition and decondition. There are now two ways to indicate whether a variable is to be considered an observation: passing the variable as an argument (the "old" way), or using condition / | (the "new" way). The latter has is, arguably, more intuitive, in addition to being much easier to work with programmatically.
[ ] @submodel. A macro that allows you to use models within models. Makes it very easy to write modular models.
[ ] logprior, loglikelihood, and logjoint. Easy-to-use methods for evaluating the model in different ways.

Developer-side:

[ ] Implementation of the LogDensityProblems.jl interface for a @model.
[ ] DynamicPPL.TestUtils. This is a sub-module of DynamicPPL that can be quite useful if one is developing features for Turing.

We will add more to the list as we go on, but for now this is a good starting point.

dlakelan commented 1 year ago

Hi there. I'm really interested in helping to improve the documentation. I pretty much reach for Turing as my first (and usually only) line for Bayesian inference. And have developed a bunch of tutorials and things which are sort of semi-publicly available that use Turing. I even occasionally get some time I could imagine doing the work :sweat_smile:

In terms of topics I really like the list so far. I can't remember, are there macros that allow you to tell the model to treat something like an observation? Something like:

foo = myfunction(my_observations)
foo ~ MyDistribution(my_parameter)

that will currently (I think) replace foo with draws from MyDistribution rather than conditioning the model on the transformed observation.

torfjelde commented 1 year ago

I can't remember, are there macros that allow you to tell the model to treat something like an observation? Something like:

I've probably made some at some point, which is maybe where you've come across it. I'll see if I can dig it out. But that is not "officially" supported so I don't think that should go in the docs for now :confused:

dlakelan commented 1 year ago

Hi @torfjelde, let me try to get just the basics down first :-) which repo has the docs for the website turing.ml ? It looks like it's the Turing.jl repo, but I want to make sure.

And then, how does that get built into the website? And what's a good workflow for "testing" the docs? If I wanted to write text, or add sections, and then build and check it on my local Linux workstation (Debian) what do I need to install, and how do I make it happen?

Thanks.

torfjelde commented 1 year ago

The source of the docs are found here: https://github.com/TuringLang/turinglang.github.io. You can find instructions on how to get it up and running locally in the README:) Let me know if something isn't clear!

And then the library docs, i.e. everything under https://turinglang.org/library/ is found in the corresponding package.

dlakelan commented 11 months ago

Haven't forgotten this project! Though I'm not excited about installing Jekyll and such on my desktop machine, I'm still up for working on some of the documentation. I've been working on writing some other stuff, and just got to the point where I wanted to USE TURING again, so now it's fresh in my mind.

Some other thoughts: There's not an easy way to get from the turing documentation website to a place where you can find out all the different sampler algorithms that are available, what the constructors are for them, and a little about how they work.

I've got a problem which isn't playing nice with autodiff and I decided to try using alternative samplers to those that require derivatives, and it was frustrating to try to pick an algorithm and figure out what arguments were required. For example MH() is basically never going to work in reality because proposal from the prior is very rarely going to be a good proposal. What you want is diffusive MH which I guess at some point was called RWMH() but that no longer exists? Anyway, that whole ball of wax could use some attention.

dlakelan commented 11 months ago

Some other thoughts I've had:

1) How do you get a list of model variables? How can you determine if a variable is discrete or continuous? 2) How does @addlogprob! interact with models that use condition/decondition? I guess we could use priors for things that are intended to be data, but then condition on them having a particular value? This is actually a bigger ball of wax than just addlogprob! 3) How does someone write a new sampler? 4) Is there a system for Tempering? I'd like to be able to run in parallel two separate chains for two different but related models, and have them occasionally try to swap states between them. 5) We need more examples of how to work with MCMCChains objects, extracting certain sub-variables, extracting only certain samples, is there a way to sample randomly from a chain to get a single "row"? like sample(mychain,1)? 6) In the tutorials, pointers to good diagnostic plots and diagnostic stats packages etc. For example maybe ArviZ? or something else, some simple examples of how to use them. 7) Possibly break up the documentation into sections relevant to the 4 types of documentation as described here: https://www.writethedocs.org/videos/eu/2017/the-four-kinds-of-documentation-and-why-you-need-to-understand-what-they-are-daniele-procida/ Right now we have along the top of the website: "Get Started", "Library API", "Tutorials". Then if you click the main headings are "Using turing" "For Developers" "Tutorials" and "Contributing".

I'd like to see...

Get Started (a goal oriented how-to guide to installing turing, writing a simple model, sampling from the model, plotting results from the sample)

Learning oriented tutorials: A big list of examples, we've got this pretty much

Understanding Oriented Discussion: Discuss how Turing relates to some of its related packages, DynamicPPL, AdvancedHMC, AdvancedMH, MCMCChains, Distributions, and what functionality comes from what pieces of this puzzle. How to extend things, like writing your own sampler, writing your own specialized Distribution, writing your own specialized diagnostics for Chains, writing plot routines that take a chain... Also, what is a Turing model? What "fields" does it have and what would you legitimately do with them (for example suppose you wanted to write a "pretty printer" for a model?)

I realize this is more than just "let's shove some new material into the current documentation structure" but I do think it's what's needed to make hacking with Turing itself more accessible outside the core developer team. If I knew more about this stuff for example I probably would be experimenting with new sampling methods, such as tempering, and piecewise deterministic processes.