JackMcKew / pandas_alive

Create stunning, animated visualisations with Pandas & Matplotlib as easy as calling `df.plot_animated()`
MIT License
581 stars 100 forks source link

multiple_plots: fixes + others #17

Closed SoundSpinning closed 4 years ago

SoundSpinning commented 4 years ago

Hi Jack,

At long last, here is my very 1st PR ever, so brace yourself... hehe. Do shout if you find any horrors.

1.- Temp workaround for Issue #6: I had a good look at this issue where Line plots connect 1st and last points when used in a combined plot. I couldn't (as yet) find a clean fix. But a workaround is to set filename=None in the individual 1st definition of the line plot. See in ./test_notebooks combined MP4 from my test notebook pendulum_sample.ipynb.

What I did find on this issue is that if you print(i) inside update_all_graphs() you get an odd zero frame inserted from somewhere in the 2nd loop. E.g. you get i=0, 0 (inserted), 1, 2, 3... ; or if you make the i in the code to start from 1, then you get i=1, 0 (inserted), 2, 3, ... but dunno why yet.

2.- Fixed problem mentioned in Issue 11 where the last frame was missing with combined_plots.

3.- Added a progress bar option to combined_plots. This is quite handy here since these can take a while.

4.- I expanded the error message about tqdm, as now with tqdm.auto in notebooks we also need ipywidgets installed.

5.- Some spellings here and there as I went through the code, nothing major. I also looked into size of movies we discussed, and then I see is to do with a combination of figsize in inches and dpi in dots (pixels) per inch. The current default in the code is dpi=144. Maybe 72 is enough, as it is typical with most screens(?). I used dpi=100 in my test notebook coz it is easy to then know the size in pixels [ figsize x dpi=100 ].

I tested all this 1st with notebook ./test_notebooks/pendulum_sample.ipynb, which runs fairly fast to try stuff. Then I ran your README.ipynb and didn't give me any errors, which is good.

Have fun!

JackMcKew commented 4 years ago

Thank you for the PR! ๐Ÿš€

Can we add something in the README with regard to the test_notebooks directory for potential future developers. Another useful thing might be a link for future developers to the write up in the issue thread for conda.

SoundSpinning commented 4 years ago

All good. I am away this week, will get onto it when I get back. If you have other ideas, send them on and will have a look.

JackMcKew commented 4 years ago

No problem mate, thank you for your work!

With regard to the README.ipynb as well, I like to clear all the outputs from the cells after it's been run as to not bloat the README with unnecessary tables ๐Ÿ˜„

JackMcKew commented 4 years ago

Also, for any GIF files, I like to run GIFmicro across them as to make them as small as possible

http://www.romeolight.com/products/gifmicro/

JackMcKew commented 4 years ago

For the point around ipywidgets and tqdm, this is working as intended in my opinion. If the user wishes to use the progress bar, it's upon them to install tqdm in the current environment, and same applies for ipywidgets.

Other than that from the testing I've done locally, I'll be happy to merge once we clean up the README.ipynb, minify the GIFs and add those notes in ๐Ÿ˜„

SoundSpinning commented 4 years ago

I'll be looking at your notes and tidying up my PR this week. In the meantime, I have a couple of general comments that I didn't have time to mention last week:

1.- Do you really need the limitation when using interpolate() (interpolate_period=True) in Pandas to stick only to DatetimeIndex type indexes? Shouldn't the interpolation be available if users want it for any numerical index as long it is in ascending order? This typically will be any time domain array. I think this will make the package a bit more general, rather than focused on date based datasets. 2.- Isn't the default steps_per_period=10 a bit steep, making the examples to take a while to generate? Would a value of maybe 3 be enough, and then make the package more attractive to new users? 3.- I think what you've done with bubble charts having x_data_label & y_data_label to set axes x vs y could be expanded to line and scatter plots. This again would make the package more general as it is quite a common plotting setting in pd.plot(x= , y=) or pd.scatter(x= , y=), which is used frequently with engineering datasets?

SoundSpinning commented 4 years ago

Hello mate, I hope you're good. Sorry for the long gap, life and work got in the way... Over the last few days I was able to get back into this properly. Following are proposed fixes + additions with this PR:

1.- At long last I cracked the puzzle with the ongoing problem with animate_multiple_plots when re-using a line or scatter chart previously saved in the same notebook. Corresponding subplots would show a static figure from a previous single animation. This was a bit tricky because I think it was a combination of two issues: a) Main reason was the use of .append() for x & y data in the loops for line and scatter plots. The last frame would be stored and then appened again next time round. Hence the full chart appeared joined to the start point. b) FuncAnimation without an init_func= uses two zero frames by default, and may recover old frames at the start when re-running same cells in a notebook. You will see a clearing() function for this and after each movie save, so contents are cleared but axes titles and settings get passed properly.

2.- I noticed you were careful with individual plots to create Figure() instances, however on multiple plots and geoplots I think it was still creating figure() ones. So I made animate_multiple_plots and geoplots to create a Figure() instance instead of a figure() one. The difference in performance between the two is striking, and I didn't know this. When I changed all figures in my test notebook to figure() all animations made matplotlib take twice as long to generate them. No idea why. I added some notes on this in README and in my notebook.

3.- Fixed fig= for single animations not being passed properly as a custom figure, at least from my testing.

4.- Several performance improvements to speed up loops in animations: a) In line/scatter charts I made the data to be updated, rather than removing/regenerating lines/points. b) fill_under_line_color= was slowing down significantly the loops. Matplotlib does not update the fills, but adds a new one per iteration. I set this up only in the 1st iteration of the loop, and the animations speed up by about 4 times. c) When using fixed_max=True I made it to be calculating all values only once, rather than again for each iteration. d) I made the default for steps_per_period= 5 instead of 10. The README runs much faster and movies look OK to me. Hoping that it attracts new users if the examples don't take too long.

5.- Fixed a problem with line and scatter plots being chopped off near the vertical ylim values. I added a little tolerance on these when required.

6.- Added a colorbar with bubble plots when a Pandas df column is passed as a colour. Currently only for individual animations. Also added options to control the colorbar scale limits with vmin & vmax. If None, then they are automatically calculated.

7.- Added option add_legend= for line and scatter animations for single & multiple plots. Default is True.

8.- Added the option enable_progress_bar= to animate_multiple_plots. Default is False.

9.- Added a new folder ./examples/test_notebooks/ with my notebook and for future collaborators to add new ones.

10.- Removed the limitation for the interpolation (interpolate_period=) only to work with a DateTime index. It should now work with any numeric df index, at least from my tests.

I followed your previous comments and cleaned up the README files, plus added new info where required and a couple of new animations. I also reduced the size of all GIFs I re-generated.

I learnt a lot going through your code, some real nice tricks in there! I hope you like what I've done, shout if you see any baddies. Have fun!

JackMcKew commented 4 years ago

Wow! This is amazing! Thank you!

Everything is looking good to merge. My only thing that is (optional), is to include an example of the pendulum code (even if it's only the output) in the README. Let me know whether you want to do this or not and after the decision, I'll merge ๐Ÿ˜„

This will also mean we can close all the open issues!! ๐Ÿ˜„

SoundSpinning commented 4 years ago

Great, thanks ! ๐Ÿš€ I'm happy to add anything to the readme. However, the pendulum code is quite large. Do you mean just the pandas_alive entries to the 2 animations I added to readme? Can you expand a little on this? I wouldn't want to bloat the readme, it's quite large already.

Also, if you want, I can update the help docs in another PR. Let me know if you have notes on how you go about this the right way.

On the issue about performance with Figure() vs figure() I reported it to the matplotlib crowd. After an early dismissal of my claims, they've now actually admitted there is a problem. We'll see if they fix it.

Also, you may have seen I mentioned you on the notes about conda. It'd be neat if you could tidy up your pip requirements.txt. Some packages like pillow, geopandas are not there with versions used? Note that I was able to run my notebook and readme OK with all packages updated to their latest versions as of last week. Maybe in your requirements file you are able to do like with conda .yml file, only the main packages installed with versions without dependencies. It'd be more clear I think. But not very important, I know.

I have a bit of time this week before I start a new job on 2nd Nov, so hit me up with any additions and we wrap this bit up for now. Thanks for all your help from the start, really great stuff you did here.

JackMcKew commented 4 years ago

Seems like you're on an absolute roll in the open source world lately, that's incredible ๐Ÿ˜ Congratulations on the new job! I'm sure you'll do amazing, you're communication skills are awesome

In terms of the README, probably leave it for now, I've got some ideas on how we could make it a bit cleaner (collapsible code blocks for ones longer than 5 lines), and I'll be sure to add in any animations produced in this PR

The notes about Conda are awesome! They probably have their own place in the documentation, but I feel like they'd be at a disservice being only on this project and should be more prevalent on the internet, nevertheless after this PR is merged, I'll be sure to add them to the documentation while I release the next version. In terms of pandas_alive documentation, this PR fixes many of the doc strings on functions and the like, which the documentation is built out of ๐Ÿ˜

In terms of updating requirements.txt, the one in the root of this project is actually only for the CI/CD workflow on GitHub Actions (eg build docs), the rest of the dependancies are handled by Poetry in the pyproject.toml. Packages like geopandas, tqdm, etc in the context of pandas_alive are all optional (eg, GDAL on windows is a pain outside of Conda) hence why they are not hard requirements, the user should know when they need them

In summary, when I find the time this week, let's:

Once again, thank you for all the hard work you've put into this, I hope you learnt as much as I learnt from you! ๐Ÿ˜

SoundSpinning commented 4 years ago

Excellent plan, agreed. Thank you for your 1st comment. Indeed I've embraced this open source ecosystem, mainly coz it makes a lot of sense. I come from many years tied to very expensive engineering commercial software, which didn't always make a lot of sense. I got very encouraged with this because of your help at the start and your positive attitude, you've made an old dog very happy! Even more so, if you think you've learnt something from me. Keep me posted if you need help during this week. Have a good one!

SoundSpinning commented 4 years ago

One minor thing on the help docs. Any relative path in the original README to a file or directory, won't work, since help is served in GitHub with a different domain than that one with the repo. You may want to have in the Jupyter README a full path to those, starting with https://github.com/JackMcKew/pandas_alive/. I think that'll make them work on both readme and help docs next time around.

JackMcKew commented 4 years ago

I'm not sure I understand what you mean? The only things relatively linked in the README are the GIFs from my understanding and they work fine across the documentation and the README.

SoundSpinning commented 4 years ago

Sorry, I was a bit brief earlier. Try clicking in docs hyperlinks e.g. to: requirements.txt or the conda .yml file?

SoundSpinning commented 4 years ago

I meant the docs here: https://pypi.org/project/pandas-alive/ e.g. CHANGELOG bottom link has the same prob.

JackMcKew commented 4 years ago

Ah I understand! Yes this is a problem โ˜น๏ธ but at least it's an easy fix, I'll open an issue which whenever I get the chance I can amend โ˜บ๏ธ

Thank you so much again!

SoundSpinning commented 4 years ago

Happy to do that for you later on. ๐Ÿช‚

JackMcKew commented 4 years ago

If you'd like to ๐Ÿ™‚ I'll even try and figure out this hacktober thing and you might even get a t shirt!

https://github.com/JackMcKew/pandas_alive/issues/18

SoundSpinning commented 4 years ago

Wooot! a free Tee, I'm in! ๐Ÿ˜ฎ

JackMcKew commented 4 years ago

I only heard about it today, but if you've made 4 valid pull requests I believe you get one:

https://hacktoberfest.digitalocean.com/

JackMcKew commented 4 years ago

I've added the hacktoberfest topic, and your PR was merged in the month of October so that should be 1 for you ๐Ÿ˜„

SoundSpinning commented 4 years ago

Cheers mate, will have a look later see what that is about.