StatisticalRethinkingJulia / StatisticalRethinking.jl

Julia package with selected functions in the R package `rethinking`. Used in the SR2... projects.
MIT License
385 stars 32 forks source link

Reexporting of packages #121

Closed Shmuma closed 2 years ago

Shmuma commented 3 years ago

I've noticed that StatisticalRethinking.jl is reexporting lots of packages which are not being used in methods. For example, BSpline, has no references, but was exported.

This is a bit confusing, because it adds "fake dependencies" of the package and issues with importing both StatisticalRethinking and original package which was reexported.

To illustrate, if we do this:

using StatisticalRethiking
using Interpolations

methods(interpolate)

Will produce a warning WARNING: both StatisticalRethinking and Interpolations export "interpolate"; uses of it in module Main must be qualified

Similar situation is with other reexported modules. This is very surprising that by importing StatisticalRethinking we're automatically getting full DataFrames methods set into our namespace. What if I don't want to use data frames? From my perspective, user should have a control over the functionality it gets by importing modules he is going to use, so, this "extra surpise bonus" is not very polite behaviour of StatisticalRethiking.

goedman commented 3 years ago

Hi Max,

This is a good point. As a (too?) broad generalization, this shows a characteristic difference between folks choosing the Turing route and folks simply trying to switch to Julia from R/Stan. The latter group has been my target audience until now and as such I tried to hide complexities, such as finding which library to use, from those users

I've (barely) started to think about/address/work on this in the StatisticalRethinking v4 branch. The Stan side of SR4 will be based on AxisKeys.jl chains and decouple plot choices from SR4 (e.g. there will be separate packages StatisticalRethinkingPlots and StatisticalRethinkingMakie).

I'm traveling the next couple of days, but will at least be checking in during the evenings.

This is a very useful discussion and fresh views like yours help a lot!

Shmuma commented 3 years ago

Hi! Thanks for the explanation.

From my side, I have much simpler goals: I'm currently working on reimplementation of SR book's samples on Julia with Turing. Mostly for self-education, but also think the result might be useful for Julia popularisation. So far I've done 4 chapters (https://shmuma.github.io/rethinking-2ed-julia/) and find julia code much shorter and aesthetically pleasant than python+numpyro (https://fehiepsi.github.io/rethinking-numpyro/).

As I don't need to maintain STAN and previous versions compatibility, my suggestions and vision might be biased towards oversimplification, sorry in advance. But, on the other hand, 3rd opinion might always be useful.

So far, I have impression that StatisticalRethinking.jl might be a useful tool for somebody who has read a book and want to apply methods to practical problems using the tricks learned. So, it's more like an high-level set of utilities needed for bayesian methods (summary of DataFrame = precis(), or sampling from dataframe or chain data, etc), rather than some concrete set of methods and algorythims. In principle, over time, those functions could be contributed to more proper packages, (like, sampling from dataframe might be part of DataFrames) and eventually removed from StatisticalRethinking to reduce code duplicates. In any case, ideas have a tendency to migrate and spread.

In any case, during my journey through the book, I'm going to open issues on the problems I'll notice and do PRs :) Thanks for the package, it is very useful.

goedman commented 3 years ago

Very interesting to understand where you are coming from and where you would like to go.

My "problems" started in chapters 5 & 6 (causal vs. correlated) and chapters 7 & 8 (model comparison). In R this was all available but not in Julia. With the recent release of ParetoSmooth.jl, some of the model comparison stuff has gotten easier to implement. And hence SR4 will switch from my own StatsModelComparisons.jl to ParetoSmooth.jl.

For the material introduced in chapters 5 & 6 I developed StructuralCausalModels.jl, but that is at best a bandage until a real Julia package is available that is on par with e.g. R's dagitty. R's dagitty and ggm packages have been developed and improved over many years.

In my experience, folks that really want to benefit from Turing do have to delve a bit deeper into Julia than the target audience I originally was aiming for. But I gladly took the work by Karajan to explore an early version of SR with Turing and had a similar experience as you, once you've written more Julia code "it just feels right". Unfortunately I lack the cycles (and to some extent the knowledge) to further that work.

I would be more than happy to give you full reign to take StatisticalRethinkingTuring to wherever you think it should go and collaborate on what consequences this might have on StatisticalRethinking.jl. Or you could simply decide to turn StatisticalRethinkingTuring from a project into a package and minimize dependency on SR.

My priorities are, in addition to SR4, to finish SR chapters 10 to 16.

goedman commented 2 years ago

With StatisticalRethinking v4 now available I believe the reexport-ing issue has been addressed. Thanks for giving me a gentle nudge!

The dependencies on other packages will also be further reduced over the next couple of weeks, either the dependency is not needed for SR to compile and can be added to the project environments, or, if it is needed for testing, a dependency can be moved to the [extras] section in Project.toml.

I'll close this issue for now.