Closed jkittner closed 1 month ago
Ah yeah - your tests are broken. You can't do that. Something changed in the other repo. Also network connection during tests is not always something you can rely on. This will always be flaky...
I would simply add the test files to this repo. Or since they are test-specific even hardcode them into the test itself...
@jkittner thank you so much for working on this and for submitting this pull request.
I apologise in advance since I did not clearly specify this in the documentation, however, I would like to ask you to submit your pull request to the development
branch and not to master
. Could you please do that?
Feel also free to update the documentation if you have time.
There is a group of students who is actively working on pythermalcomfort and we are implementing some big changes on how tests are run, this is the reason why some of the tests may be failing.
I completely understand your concern of not having to rely on the network to run the tests. I am happy to hear suggestions on how we can fix that. However, we want to move all the validation tables to a new repository which is going to contain the validation tables for jsthermalcomfort
and comf
. Currently, we are requesting these data each time we run a test, however, we may want to consider another approach such as using submodules but I am not vary familiar with that.
okay, I rebased and made the PR against the development branch.
If you want a centralized location for the test data because you're using it for multiple repos, I'd also suggest a submodule which you can pin at a specific commit-sha so it won't break your tests the moment you change something in the data-repo.
since removing the round
did give me such a performance gain, I had a look at where else it is used:
git grep "round(" | grep -v tests | wc -l
returned 95
entries (probably better to use pytest.approx
in tests)
I am not a huge fan of that tbh - rounding over and over again will make you slowly loose precision. I think it's up to a user to decide when to round (likely only the final result)
thank you for creating a pull request to development
.
If you want a centralized location for the test data because you're using it for multiple repos, I'd also suggest a submodule which you can pin at a specific commit-sha so it won't break your tests the moment you change something in the data-repo.
great, the students will not have time to fix this but this is something I can try to fix in the future. I am going to open a new issue so I do not forget
Thank you for the suggestion, if you have time we can also try to fix that and perhaps consider removing the round function, we will have, however, to make sure that this is not going to break the tests.
The orignal goal was to be able to pass arrays to the function i.e. allow to use this function in pandas without a
lambda
wrapper. For the tests cases this would like this.np.vectorize
now allows it to also take n-dimensional arrays - this can especially be useful for raster data. For some randome data this would like this.Unfortunately the
pet_steady
function is notoriously slow. A benchmark showed these results:Adding the vectorization caused single values to become even slower, so I tooke a few steps to improve performance. Mostly arrays benefit from this now, but also single values are 1.13x faster.
2D-arrays are now 1.81x (10x10) or 1.73x (100x100) faster.
The biggest performance gain was achieved by optimizing the
p_sat
(saturation vapor pressure) function, which is called 156x times per function call. Simply moving the constants out of the function already gave some gain. Then cutting the numberlog
calculations in half also improved performance. However, the biggest improvement was removing the rounding - which imo is also something I would not expect from a function to do implicitly.There are still a few more things that could be done (e.g. moving some of the inner functions out of the function). In the end the bottleneck will always be the
fsolve
- replacing it will be the biggest improvement before rewriting the entire function to natively work on np.arrays (no morenp.vectorize
needed).I did try some
numba
jit-compiling but pure numpy always outperformed it by far.I added tests for the array cases and shortened the tests bit by using
pytest.mark.parametrize
.The behavior of
p_sat
and derivatives changed slightly, due to the rounding not being executed any longer. But at best that's more correct now.This PR is a first step- I might look deeper into this at a later time, further optimizing it so it might be usable for larger rasters in the end.
Happy to hear your thought's on this!