PSIAIMS / CAMIS

https://psiaims.github.io/CAMIS/
Apache License 2.0
60 stars 60 forks source link

Survey Statistics - Example/Comparison (Python) #281

Closed michaelwalshe closed 3 months ago

michaelwalshe commented 3 months ago

Closes #185

Adds a new example in Python for survey statistics, and additionally updated the comparison between R and SAS to compare with Python.

I believe (from looking at the stats table) that this is the first R/SAS/Python comparison, so any notes on the structure of the comparison or how it should be formatted would be appreciated! I was originally going to do a matrix of R/SAS/Python matches for each stat compared, but that seemed verbose so have just added a column to the comparison table. However, for comparisons with more differences and fewer sections a different structure may be better.

Additionally, I've included a python snippet to print the Python and package version info in a note at the bottom of the Python and Comparison docs. sessioninfo::session_info() can report this information, but it depends on {reticulate} which we're not using.

Also just to note that I encountered some bugs in samplics, which prevented me from making certain comparisons. I've opened issues to fix these, but looking at the timeline of other fixes I think it makes sense to publish this as-is and return to it if there's any changes.

michaelwalshe commented 3 months ago

Will take a look at the failing build later today - looks like the python package versions don't quite work for the python version used by the deployment.

statasaurus commented 3 months ago

I found the issue! The renv.lock file got messed up for python which was making everything fail. If you update from main it should work

michaelwalshe commented 3 months ago

Hi @statasaurus - I merged the updated main branch into my local feature branch, however hit a couple of snags. The version of samplics I was using (and most of the recent versions, which include several new features) require python >=3.10. From some checks all other packages and comparisons were independent of the Python version, so I updated python in the renv.lock and in the GH actions.

I also updated the "Name" field in the renv.lock, as that was auto-updated with an absolute path to the actual environment used, so I removed it and it auto-updated to use an environment local to the project (in renv/python/virtualenvs/....).

I then ran into some dependency issues with the GH actions, as some of the packages in auto-generated requirements.txt are windows only (from checking the deps, these are those required by Jupyter which is used to render Python only Quarto files). I fixed this by adding a check to only install on windows, but I'm not sure this will stick when the requirements are auto-updated again.... One fix for this could be by forcing even the Python qmd files to render using knitr rather than Jupyter, removing the Jupyter dependency.

Finally, reticulate wasn't being tracked as a dependency, even though it's required, so I added that to the renv.lock.

😵 Renv + Python + multi-platform support is definitely not easy!

statasaurus commented 3 months ago

OMG I was hoping it was going to be easy!! Thank you for doing all that!