SlideRuleEarth / sliderule-python

SlideRule Earth Example Notebooks: On-demand, cloud-based processing of satellite mission data (NASA ICESat-2, GEDI, ArcticDEM/REMA, HLS)
https://slideruleearth.io/rtd/
BSD 3-Clause "New" or "Revised" License
41 stars 21 forks source link

ResourceWarning: unclosed socket error #106

Closed slhowardESR closed 2 years ago

slhowardESR commented 2 years ago

Hi JP,

I am doing some testing - preparing for the large run, and sometimes I get this problem, will kills the program.

`sys:1: ResourceWarning: unclosed socket <zmq.Socket(zmq.PUSH) at 0x19391b4c280> ResourceWarning: Enable tracemalloc to get the object allocation traceback sys:1: ResourceWarning: unclosed socket <zmq.Socket(zmq.PUSH) at 0x19391b4c400> ResourceWarning: Enable tracemalloc to get the object allocation traceback sys:1: ResourceWarning: unclosed socket <zmq.Socket(zmq.PUSH) at 0x195403fc640> ResourceWarning: Enable tracemalloc to get the object allocation traceback sys:1: ResourceWarning: unclosed socket <zmq.Socket(zmq.PUSH) at 0x19604735dc0> ResourceWarning: Enable tracemalloc to get the object allocation traceback Traceback (most recent call last):

File D:\Jupyter\sliderulework\Spyder_SR\SR_by_RGT_5files_print_status.py:108 in main()

File D:\Jupyter\sliderulework\Spyder_SR\SR_by_RGT_5files_print_status.py:81 in main gdf = icesat2.atl06p(parmsyp, version=args.release,

File d:\jupyter\sliderule-python\sliderule\icesat2.py:881 in atl06p return parallelize(callback, atl06, parm, resources, asset)

File d:\jupyter\sliderule-python\sliderule\icesat2.py:597 in __parallelize result, resource = future.result()

File ~\anaconda3\envs\sliderule\lib\concurrent\futures_base.py:437 in result return self.__get_result()

File ~\anaconda3\envs\sliderule\lib\concurrent\futures_base.py:389 in __get_result raise self._exception

File ~\anaconda3\envs\sliderule\lib\concurrent\futures\thread.py:57 in run result = self.fn(*self.args, **self.kwargs)

File d:\jupyter\sliderule-python\sliderule\icesat2.py:459 in __atl06 rsps = sliderule.source("atl06", rqst, stream=True)

File d:\jupyter\sliderule-python\sliderule\sliderule.py:425 in source __clrserv(serv, stream)

File d:\jupyter\sliderule-python\sliderule\sliderule.py:184 in __clrserv server_table[serv]["pending"] -= 1

KeyError: 'http://34.212.131.26'`

I am not sure what is causing this. I can send you my code. I am basically trying to do SR-YAPC processing for individual rgt in region 10 and 12. I am running two regions, separate processes, at once. Some times it works. and Sometimes it crashes.

let me know if you need more info

jpswinski commented 2 years ago

@slhowardESR I've not seen this before. I wonder if the client is spawning too many threads and making too many concurrent connections. Can you send me the code you are using? If you want, you can send it to me via slack so the code is kept private. I will run things on my side and see if I can recreate and diagnose the problem.

jpswinski commented 2 years ago

@slhowardESR after some investigation, there appears to be a number of different things happening on your processing runs:

jpswinski commented 2 years ago

susan_run_3_v_1

jpswinski commented 2 years ago

The above snapshot shows a run with max pending set to 3, and then again with max pending set to 1. When set to 1, there are no servers that bounce, though some of the memory dips are pretty low.

jpswinski commented 2 years ago

The temporary fix of setting the max pending to 1 seems to have worked well. All future development on this issue will be tracked under ICESat2-SlideRule/sliderule#117