earthlab / matplotcheck

A python package for checking and testing matplotlib plots. We use this for autograding student assignments but there are many other potential use cases including package testing (for packages with plots)!
https://matplotcheck.readthedocs.io
BSD 3-Clause "New" or "Revised" License
18 stars 8 forks source link

Scatter Plot w Fit Lines Vignette #228

Closed nkorinek closed 3 years ago

nkorinek commented 4 years ago

Vignette to demonstrate line testing capabilities in the base package in matplotcheck.

This vignette seems a bit sparse. I left out assert_xydata as it was demonstrrated in the plot_testing_basics vignette, but I could add it in here as a reminder if we wanted to! Also: I could add in x/y lims assertions in this vignette if we wanted to. I wasn't sure how focused we wanted this vignette to be, so I stuck to the basics and I have tests ready to add if we decide we want more!

codecov[bot] commented 4 years ago

Codecov Report

Merging #228 into master will not change coverage. The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #228   +/-   ##
=======================================
  Coverage   83.06%   83.06%           
=======================================
  Files          21       21           
  Lines        1943     1943           
=======================================
  Hits         1614     1614           
  Misses        329      329           

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 3226ba8...3226ba8. Read the comment docs.

nkorinek commented 4 years ago

@lwasser so this vignette is failing due to my use of seaborn to plot the regression line. Seaborn is not part of the CI environment. Do you want me to figure out a way to plot this without seaborn, or would we be able to add it to the environment? I know we've talked about not adding anymore packages to this environment, but thought I'd see what your opinion was. Thanks!

lwasser commented 4 years ago

@nkorinek ahhhhh we can add seaborn to our docs envt. i'm not sure if you can do it here.

https://github.com/earthlab/matplotcheck/blob/master/readthedocs.yml

the easiest would be to use something else. is that an option tho? some tools like geopandas do have a yml file just for read the docs.

lwasser commented 4 years ago

@nkorinek let's keep this simple. let's add seaborn as a dev-requirements item in the list of tools

jlpalomino commented 4 years ago

For our documentation, we have verbally discussed changes that will be completed by nkorinek, including the addition of more explanation about the object type that is being tested (e.g. a story about what kind of data is being used and what the test is actually looking for) both at the top of the notebook and in the comments, and a new section at the bottom of the vignette for additional options when using a Jupyter Notebook implementation.

nkorinek commented 4 years ago

@jlpalomino I'm hunting down a few issues on this branch, I'll ping you when they're ironed out

nkorinek commented 4 years ago

@jlpalomino I fixed the bugs I was seeing. CI is failing due to an unrelated issue and this vignette could use a review now!

nkorinek commented 4 years ago

@lwasser this vignette has been updated with the changes suggested by @jlpalomino

nkorinek commented 4 years ago

@jlpalomino fixes are in!

nkorinek commented 4 years ago

@jlpalomino title changed!

nkorinek commented 4 years ago

Update: this now relies on #274 to be merged first

nkorinek commented 4 years ago

@lwasser updated this vignette as per our discussion earlier today. Two important things to note:

  1. I ran into a bug with assert_xydata() that I've never seen before. It was failing two identical numpy arrays saying that there shapes didn't match. One has a shape of (15, 1) while the other had a shape of (15,). So they were the same array, it's just that one was nested inside the other. I fixed this by adding .flatten() to the end of the checks to make all of the data checks one dimensional. This will still fail data that doesn't match in length, but fixes this issue. Hope that's ok! Just a note: these arrays should never be more than one dimensional since it's checking the individual columns of a dataframe.

  2. I know we discussed check-coverage a lot, however, I found an instance where I think the check should be automatic. Not to say we should take the argument out of the users hands, but when you want to check a regression line and a onetoone line at the same time, it causes issues. You can't set the check_coverage to false, as it won't check the coverage of the regression line. However, if you leave it as true than the onetoone line will make the test fail. SO. I think it should be set to False automatically if the line is a onetoone, but we let users still decide if they want to check coverage on the regression line. Let me know if that makes sense!