I-GUIDE / CI_Platform

iGUIDE CI Platform Deployment
Apache License 2.0
0 stars 0 forks source link

Platform Getting Out-of-Memory (Kernel Restarting) #27

Closed fbaig closed 1 month ago

fbaig commented 1 month ago

Problem The platform assigns 4GB of memory and 2CPUs for every user. The following notebook keeps showing "Kernel Restarting" which I'm assuming is because of getting out of memory. CyberGISX assigns same resources for every user, however, the notebook runs without any issues on CyberGISX.

Code https://github.com/I-GUIDE/data-with-notebooks/blob/notebooks/Twitter%20Notebooks/Data%20Collection.ipynb

rkalyanapurdue commented 1 month ago

@fbaig i just ran this notebook (after downloading the shapefiles and uploading them to Jupyter) without getting any kernel errors. Maybe it was a long running container that had some memory issues? Could you please try again?

YunfanKang commented 1 month ago

@rkalyanapurdue Sorry the link was incorrect. Here is the link to the notebook that will reproduce the memroy issue on I-GUIDE: https://github.com/I-GUIDE/data-with-notebooks/blob/data-access/Twitter%20Notebooks/Choropleth%20Maps_Get%20Data.ipynb

rkalyanapurdue commented 1 month ago

@YunfanKang I just ran the notebook end to end and did not run into any memory issues... could you try restarting the kernel and trying?

YunfanKang commented 1 month ago

@rkalyanapurdue Could you please try again with this notebook: https://drive.google.com/file/d/1bwuK0VvOXUZ_d0SDJnO1tN5FvS3rpPNi/view?usp=sharing instead or just deleting the "del data" in the second cell under "1. Processing Twitter/X Data"? I this was the "fix" I trid ty by pass the kernel restarting and changed this back in the notebook above and submitted the changes but the one on github remains unchanged.

rkalyanapurdue commented 1 month ago

@YunfanKang i can reproduce the kernel restarting issue now with this version of the notebook. I will test it on CyberGISX as well to confirm. Need to look into the Jupyter container's metrics to figure out why it is running out of resources.

rkalyanapurdue commented 1 month ago

checked on CyberGISX; seems like at peak, resident memory used is actually beyond the limit of 6GB that I-GUIDE imposes.

Screenshot 2024-06-10 at 2 49 19 PM

rkalyanapurdue commented 1 month ago

suggested fix: change limit to 8GB and see if that resolves it

rkalyanapurdue commented 1 month ago

@YunfanKang please try again; I am able to run the notebook now without the kernel restarting.

YunfanKang commented 1 month ago

@rkalyanapurdue I tried the iguide and the kernel restarting issue remains the same. Also tried the Python 3 and the iguide-ewd encountered the same problem.

rkalyanapurdue commented 1 month ago

@YunfanKang can you start your notebook session and keep it running? that way I can check to see how much memory is being allocated for your container; also, can you run the top command in a terminal to track how much memory is being used? it should not be using more than 8GB since the max i've seen is 6.7GB.

YunfanKang commented 1 month ago

@rkalyanapurdue Thanks for the tips. It turns out that are other python kernels from the other closed notebook that I tested with the same set of large files running. It takes 3.7g of ram. Now it is running without running into the memory issue after stopping all kernels and starting the notebook freshly.

rkalyanapurdue commented 1 month ago

thanks for confirming @YunfanKang ; closing this issue for now as fixed with the updated 8GB limit. @nosolls we may need to add more VMs to the cluster for the Summer School to account for the increased RAM limit.