arviz-devs / arviz

Exploratory analysis of Bayesian models with Python
https://python.arviz.org
Apache License 2.0
1.56k stars 388 forks source link

Make `from_dict` more flexible, and add `from_pytree` #2291

Closed ColCarroll closed 3 months ago

ColCarroll commented 7 months ago

This also makes from_dict work for nested dictionaries. I joined the keys with double underscores because using a . would break attribute access (like, it would still work, but you'd use idata.posterior['var.x.y'], instead of idata.posterior.var__x__y)

Hopefully this is a no-op for most code, but it should make arviz work automatically with namedtuples or dataclasses.


:books: Documentation preview :books:: https://arviz--2291.org.readthedocs.build/en/2291/

ColCarroll commented 7 months ago

I'll add that this adds a requirement on dm-tree. I'd rather have used jax.tree_utils, but that requires all of jax (~1.4MB) instead of just dm-tree (~35kb).

ahartikainen commented 7 months ago

Will this affect python 3.12 use? Probably not(?)

ColCarroll commented 7 months ago

It looks like maybe: https://github.com/google-deepmind/tree/issues/109

I gave https://github.com/google-deepmind/tree/pull/111 a try.

codecov[bot] commented 7 months ago

Codecov Report

Attention: Patch coverage is 96.00000% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 86.80%. Comparing base (3fc5962) to head (bd03a88). Report is 2 commits behind head on main.

Files Patch % Lines
arviz/data/converters.py 66.66% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #2291 +/- ## ======================================= Coverage 86.79% 86.80% ======================================= Files 123 123 Lines 12722 12743 +21 ======================================= + Hits 11042 11061 +19 - Misses 1680 1682 +2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

ColCarroll commented 7 months ago

I tried it out, and it fails when trying to install dm-tree on python 3.12, because it requires CMake. Once you conda install CMake and install xcode on OSX, you can install from source, but that's probably too big of a lift for users, and we should wait on this until dm-tree builds a wheel.

Given expected workloads in arviz, we might just write our own flatten_with_path function in pure python. I'll think about that if there's no motion on the other PR.

ColCarroll commented 5 months ago

Update: The PR was merged for dm-tree, but we need to wait on a pypi release.

ColCarroll commented 4 months ago

Hey! They sneakily added PyPI wheels for 3.12 to dm-tree. I think this is now ready to go.

review-notebook-app[bot] commented 4 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

ColCarroll commented 3 months ago

I think this is ready again.

ahartikainen commented 3 months ago

if @OriolAbril approves I think we can merge

OriolAbril commented 3 months ago

I want to make a patch release before merging, maybe a couple doc changes, then I'll merge.

ColCarroll commented 3 months ago

Sounds good!

bayeux will probably pin to the next release that has this in it, since this example produces a dictionary of parameters that the current az.from_dict fails to parse correctly.

OriolAbril commented 3 months ago

I think it should be done now. Docs used to look like this:

Captura de pantalla de 2024-03-13 19-42-15

Now they are this: https://arviz--2291.org.readthedocs.build/en/2291/api/generated/arviz.dict_to_dataset.html

ColCarroll commented 3 months ago

LGTM!

On Wed, Mar 13, 2024, 3:16 PM Oriol Abril-Pla @.***> wrote:

I think it should be done now. Docs used to look like this:

Captura.de.pantalla.de.2024-03-13.19-42-15.png (view on web) https://github.com/arviz-devs/arviz/assets/23738400/35f4b904-ac37-4805-bb52-37dcf6439f90

Now they are this: https://arviz--2291.org.readthedocs.build/en/2291/api/generated/arviz.dict_to_dataset.html

— Reply to this email directly, view it on GitHub https://github.com/arviz-devs/arviz/pull/2291#issuecomment-1995455446, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARQOEH3YJBX5ZB3GZQF5ILYYCQXHAVCNFSM6AAAAAA7LNJR62VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJVGQ2TKNBUGY . You are receiving this because you authored the thread.Message ID: @.***>

papoteur-mga commented 1 month ago

Hello, This is not a cool dependency for Linux distro. Building dm-tree from source needs to download at build time pybind11. However, distros prohibit any download at build time, thus the build fails. The build does not use a library provided by the distro.

ColCarroll commented 1 month ago

Hi @papoteur-mga -- thanks for chiming in, but I'm not sure what this means!

papoteur-mga commented 1 month ago

Hi @papoteur-mga -- thanks for chiming in, but I'm not sure what this means!

* Which Linux distro?

Mageia

* Why is a linux distro building Python libraries from source?

Python packages are included in the distro to be easily installed, in particular as dependency of any application. The most of the packages are tagged as being open source. These packages should then be built to be sure there are really open source. This is also a way to minimize the risk that faulty code is introduced.

* Is `dm-tree` impossible to use there?

* Is this really a `dm-tree` issue?

Yes, but I finally found a way by patching the cmake configuration files in dm-tree to use shared libraries. https://svnweb.mageia.org/packages/cauldron/python-dm-tree/current/SOURCES/0001-use-system-libraries.patch?view=markup

* Is there a different way we could achieve this functionality. I know we could depend on `jax` and use `jax.tree`, but that seems like a heavier dependency.

I have no idea about that.

ColCarroll commented 1 month ago

Thank you for the response! Is it right to say you're here to advocate for mageia users, and not (specifically) as a user of statistical diagnostics?

I have a particular interest in the functionality that dm-tree provides, since it allows for running diagnostics in some of my own work with bayesian neural nets.

It sounds like maybe this is all set from the patch you added? If it isn't, I can try to explore using some other library to achieve my goals (but I suspect it may introduce more/different problems!)