Closed SoundSpinning closed 4 years ago
Thank you for the PR! ๐
Can we add something in the README with regard to the test_notebooks
directory for potential future developers. Another useful thing might be a link for future developers to the write up in the issue thread for conda
.
All good. I am away this week, will get onto it when I get back. If you have other ideas, send them on and will have a look.
No problem mate, thank you for your work!
With regard to the README.ipynb
as well, I like to clear all the outputs from the cells after it's been run as to not bloat the README with unnecessary tables ๐
Also, for any GIF files, I like to run GIFmicro
across them as to make them as small as possible
For the point around ipywidgets
and tqdm
, this is working as intended in my opinion. If the user wishes to use the progress bar, it's upon them to install tqdm
in the current environment, and same applies for ipywidgets
.
Other than that from the testing I've done locally, I'll be happy to merge once we clean up the README.ipynb
, minify the GIFs and add those notes in ๐
I'll be looking at your notes and tidying up my PR this week. In the meantime, I have a couple of general comments that I didn't have time to mention last week:
1.- Do you really need the limitation when using interpolate()
(interpolate_period=True
) in Pandas
to stick only to DatetimeIndex
type indexes? Shouldn't the interpolation be available if users want it for any numerical index as long it is in ascending order? This typically will be any time domain array. I think this will make the package a bit more general, rather than focused on date
based datasets.
2.- Isn't the default steps_per_period=10
a bit steep, making the examples to take a while to generate? Would a value of maybe 3 be enough, and then make the package more attractive to new users?
3.- I think what you've done with bubble charts having x_data_label
& y_data_label
to set axes x vs y could be expanded to line and scatter plots. This again would make the package more general as it is quite a common plotting setting in pd.plot(x= , y=)
or pd.scatter(x= , y=)
, which is used frequently with engineering datasets?
Hello mate, I hope you're good. Sorry for the long gap, life and work got in the way... Over the last few days I was able to get back into this properly. Following are proposed fixes + additions with this PR:
1.- At long last I cracked the puzzle with the ongoing problem with animate_multiple_plots
when re-using a line or scatter chart previously saved in the same notebook. Corresponding subplots would show a static figure from a previous single animation.
This was a bit tricky because I think it was a combination of two issues:
a) Main reason was the use of .append()
for x & y data in the loops for line and scatter plots. The last frame would be stored and then appened again next time round. Hence the full chart appeared joined to the start point.
b) FuncAnimation
without an init_func=
uses two zero frames by default, and may recover old frames at the start when re-running same cells in a notebook. You will see a clearing()
function for this and after each movie save, so contents are cleared but axes titles and settings get passed properly.
2.- I noticed you were careful with individual plots to create Figure()
instances, however on multiple plots and geoplots I think it was still creating figure()
ones. So I made animate_multiple_plots
and geoplots to create a Figure()
instance instead of a figure()
one. The difference in performance between the two is striking, and I didn't know this. When I changed all figures in my test notebook to figure()
all animations made matplotlib
take twice as long to generate them. No idea why.
I added some notes on this in README and in my notebook.
3.- Fixed fig=
for single animations not being passed properly as a custom figure, at least from my testing.
4.- Several performance improvements to speed up loops in animations:
a) In line/scatter charts I made the data to be updated, rather than removing/regenerating lines/points.
b) fill_under_line_color=
was slowing down significantly the loops. Matplotlib does not update the fills, but adds a new one per iteration. I set this up only in the 1st iteration of the loop, and the animations speed up by about 4 times.
c) When using fixed_max=True
I made it to be calculating all values only once, rather than again for each iteration.
d) I made the default for steps_per_period=
5 instead of 10. The README runs much faster and movies look OK to me. Hoping that it attracts new users if the examples don't take too long.
5.- Fixed a problem with line and scatter plots being chopped off near the vertical ylim
values. I added a little tolerance on these when required.
6.- Added a colorbar
with bubble plots when a Pandas
df column is passed as a colour. Currently only for individual animations. Also added options to control the colorbar
scale limits with vmin
& vmax
. If None, then they are automatically calculated.
7.- Added option add_legend=
for line and scatter animations for single & multiple plots. Default is True
.
8.- Added the option enable_progress_bar=
to animate_multiple_plots
. Default is False
.
9.- Added a new folder ./examples/test_notebooks/
with my notebook and for future collaborators to add new ones.
10.- Removed the limitation for the interpolation (interpolate_period=
) only to work with a DateTime
index. It should now work with any numeric df index, at least from my tests.
I followed your previous comments and cleaned up the README
files, plus added new info where required and a couple of new animations. I also reduced the size of all GIFs I re-generated.
I learnt a lot going through your code, some real nice tricks in there! I hope you like what I've done, shout if you see any baddies. Have fun!
Wow! This is amazing! Thank you!
Everything is looking good to merge. My only thing that is (optional), is to include an example of the pendulum code (even if it's only the output) in the README. Let me know whether you want to do this or not and after the decision, I'll merge ๐
This will also mean we can close all the open issues!! ๐
Great, thanks ! ๐
I'm happy to add anything to the readme. However, the pendulum code is quite large. Do you mean just the pandas_alive
entries to the 2 animations I added to readme? Can you expand a little on this? I wouldn't want to bloat the readme, it's quite large already.
Also, if you want, I can update the help docs in another PR. Let me know if you have notes on how you go about this the right way.
On the issue about performance with Figure()
vs figure()
I reported it to the matplotlib
crowd. After an early dismissal of my claims, they've now actually admitted there is a problem. We'll see if they fix it.
Also, you may have seen I mentioned you on the notes about conda
. It'd be neat if you could tidy up your pip
requirements.txt. Some packages like pillow, geopandas are not there with versions used? Note that I was able to run my notebook and readme OK with all packages updated to their latest versions as of last week. Maybe in your requirements file you are able to do like with conda
.yml file, only the main packages installed with versions without dependencies. It'd be more clear I think. But not very important, I know.
I have a bit of time this week before I start a new job on 2nd Nov, so hit me up with any additions and we wrap this bit up for now. Thanks for all your help from the start, really great stuff you did here.
Seems like you're on an absolute roll in the open source world lately, that's incredible ๐ Congratulations on the new job! I'm sure you'll do amazing, you're communication skills are awesome
In terms of the README, probably leave it for now, I've got some ideas on how we could make it a bit cleaner (collapsible code blocks for ones longer than 5 lines), and I'll be sure to add in any animations produced in this PR
The notes about Conda are awesome! They probably have their own place in the documentation, but I feel like they'd be at a disservice being only on this project and should be more prevalent on the internet, nevertheless after this PR is merged, I'll be sure to add them to the documentation while I release the next version. In terms of pandas_alive documentation, this PR fixes many of the doc strings on functions and the like, which the documentation is built out of ๐
In terms of updating requirements.txt, the one in the root of this project is actually only for the CI/CD workflow on GitHub Actions (eg build docs), the rest of the dependancies are handled by Poetry in the pyproject.toml. Packages like geopandas, tqdm, etc in the context of pandas_alive are all optional (eg, GDAL on windows is a pain outside of Conda) hence why they are not hard requirements, the user should know when they need them
In summary, when I find the time this week, let's:
Once again, thank you for all the hard work you've put into this, I hope you learnt as much as I learnt from you! ๐
Excellent plan, agreed. Thank you for your 1st comment. Indeed I've embraced this open source ecosystem, mainly coz it makes a lot of sense. I come from many years tied to very expensive engineering commercial software, which didn't always make a lot of sense. I got very encouraged with this because of your help at the start and your positive attitude, you've made an old dog very happy! Even more so, if you think you've learnt something from me. Keep me posted if you need help during this week. Have a good one!
One minor thing on the help docs. Any relative path in the original README to a file or directory, won't work, since help is served in GitHub with a different domain than that one with the repo. You may want to have in the Jupyter README a full path to those, starting with https://github.com/JackMcKew/pandas_alive/
. I think that'll make them work on both readme and help docs next time around.
I'm not sure I understand what you mean? The only things relatively linked in the README are the GIFs from my understanding and they work fine across the documentation and the README.
Sorry, I was a bit brief earlier. Try clicking in docs hyperlinks e.g. to: requirements.txt or the conda .yml file?
I meant the docs here: https://pypi.org/project/pandas-alive/ e.g. CHANGELOG bottom link has the same prob.
Ah I understand! Yes this is a problem โน๏ธ but at least it's an easy fix, I'll open an issue which whenever I get the chance I can amend โบ๏ธ
Thank you so much again!
Happy to do that for you later on. ๐ช
If you'd like to ๐ I'll even try and figure out this hacktober thing and you might even get a t shirt!
Wooot! a free Tee, I'm in! ๐ฎ
I only heard about it today, but if you've made 4 valid pull requests I believe you get one:
I've added the hacktoberfest
topic, and your PR was merged in the month of October so that should be 1 for you ๐
Cheers mate, will have a look later see what that is about.
Hi Jack,
At long last, here is my very 1st PR ever, so brace yourself... hehe. Do shout if you find any horrors.
1.- Temp workaround for Issue #6: I had a good look at this issue where Line plots connect 1st and last points when used in a combined plot. I couldn't (as yet) find a clean fix. But a workaround is to set
filename=None
in the individual 1st definition of the line plot. See in./test_notebooks
combined MP4 from my test notebookpendulum_sample.ipynb
.What I did find on this issue is that if you
print(i)
insideupdate_all_graphs()
you get an odd zero frame inserted from somewhere in the 2nd loop. E.g. you get i=0, 0 (inserted), 1, 2, 3... ; or if you make the i in the code to start from 1, then you get i=1, 0 (inserted), 2, 3, ... but dunno why yet.2.- Fixed problem mentioned in Issue 11 where the last frame was missing with combined_plots.
3.- Added a progress bar option to combined_plots. This is quite handy here since these can take a while.
4.- I expanded the error message about
tqdm
, as now withtqdm.auto
in notebooks we also needipywidgets
installed.5.- Some spellings here and there as I went through the code, nothing major. I also looked into size of movies we discussed, and then I see is to do with a combination of figsize in inches and dpi in dots (pixels) per inch. The current default in the code is dpi=144. Maybe 72 is enough, as it is typical with most screens(?). I used dpi=100 in my test notebook coz it is easy to then know the size in pixels [ figsize x dpi=100 ].
I tested all this 1st with notebook
./test_notebooks/pendulum_sample.ipynb
, which runs fairly fast to try stuff. Then I ran yourREADME.ipynb
and didn't give me any errors, which is good.Have fun!