Open bob-carpenter opened 1 year ago
Merging #10 (6d32f46) into main (a8e4f60) will decrease coverage by
0.89%
. The diff coverage is89.47%
.
@@ Coverage Diff @@
## main #10 +/- ##
==========================================
- Coverage 94.19% 93.30% -0.89%
==========================================
Files 9 9
Lines 224 269 +45
==========================================
+ Hits 211 251 +40
- Misses 13 18 +5
Impacted Files | Coverage Δ | |
---|---|---|
bayes_kit/rwm.py | 91.66% <ø> (ø) |
|
bayes_kit/ess.py | 90.74% <83.33%> (ø) |
|
bayes_kit/ensemble.py | 90.00% <89.36%> (-10.00%) |
:arrow_down: |
bayes_kit/__init__.py | 100.00% <100.00%> (ø) |
|
bayes_kit/rhat.py | 100.00% <100.00%> (ø) |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
@WardBrian: This one's not quite ready for re-review until I get test coverage up to 100%. Is there a way to see what the coverage is? I already know that I'm not testing things like all the paths through RNG creation or initialization yet. I want to get this class for ensemble sampling squared away and then I'll go back and get all the other ones up to form with doc and tests.
For test coverage, I would look here: https://app.codecov.io/gh/flatironinstitute/bayes-kit/blob/affine-invariant-walker-v1/bayes_kit/ensemble.py
That will get updated every time CI is run on this branch.
Agreed it’s good to finish things. I’m a bit hesitant about going all in on docstrings right now, especially if we think names of things or structure of APIs could still evolve - docstrings are much more annoying to keep updated than types and tests in my experience, since even Sphinx won’t warn you if the doc string doesn’t match the actual arguments that the function has
I’m a bit hesitant about going all in on docstrings right now
I'm in favor of getting the details into doc sooner rather than later for several reasons in decreasing importance.
I find it very helpful to document a function thoroughly to help work through boundary and exception conditions.
It's useful for users who want to try the API.
It sends the signal to users and potential devs that we're serious about the software.
It's just as quick for me to just add thorough doc than thinking each time about how little doc I can get away with.
I agree that there's debt to pay from thorough doc if the code changes. If the code changes but not the doc, that's even worse than no doc. I don't imagine this is going to be so volatile that we won't be able to keep up, but time will tell.
I brought this up to date with the existing branch. All tests are passing and mypy strict passes.
None of our samplers are tested up to the standard that @jsoules is suggesting in code review for this PR. If @WardBrian and @roualdes think that all the samplers should be tested to this level, I think we should start with our existing implementations as they're all more important than this one.
Given that this is the PR where I fixed all the mypy errors, I'm stuck. I have the ESS/R-hat thing ready to go, but @WardBrian doesn't want me to put that in until it passes mypy. But without it, we can't do anything practical.
I do think it'd make sense to test the other internal functions, which I haven't done yet. And if we do need to test the internal algorithm details, then it'd make sense to break functions out of the internals that we can test independently.
I’m not very picky about fixing the type errors before anything is merged - in particular, we already have versions merged to main
which are both wrong and have type errors, so any marginal improvement is fine by me as long as eventually we end up with a nice state
I think this is a good start for testing. Perfect? No, and that's ok. In line with Brian's comment, I think this PR is a good improvement over no affine invariant ensemble sampler and better than an untested implementation.
More specific tests on say init
or seed
or num_walkers
wouldn't be bad and I wouldn't turn any such tests down, but I'm certainly more concerned about the general validity of the implementation.
I imagine a more strenuous test that might necessitate changing the default value of a
could be beneficial for validating this sampler. Though this certainly won't lead to a faster test suite.
For instance, testing exclusively on 1d Std Normal models is not sufficiently validating this sampler, in my opinion. So maybe another test on say a 100d Normal, where decreasing a
might improve the sampling efficiency, could help clear up whether or not this sampler works more generally and simultaneously test the argument a
.
I'll start a new issue and make some suggestions for testing in general and maybe there can be some work along the lines of my suggestion in future PRs.
P.S. @jsoules I absolutely love the yellow billed magpie photo you've got.
Given @WardBrian's comments about typing, I'll try to get the ESS updates in first, which will in turn make it easier to write a generic testing framework, which will in turn make it trivial to write individual sampler tests.
Add affine invariant ensemble sampler.
Cleanup typing.
0 mypy errors with all tests passing, with