ESPRI-Mod / synda

ESGF Downloader (this is a deprecated repository, the tool has now moved to https://github.com/ESGF/esgf-download)
https://espri-mod.github.io/synda/
21 stars 11 forks source link

Close old file descriptors when starting a new file transfer process #151

Closed painter1 closed 3 years ago

painter1 commented 4 years ago

Add another argument to the Popen call in sdutils.get_status_output(): close_fds=True This ensures that the new subprocess will not inherit open file descriptors from the daemon process. That can make it impossible to close the stdout, stderr file descriptors when a previous subprocess ends.

Why did I make this change? For a long time my "synda watch" output has shown a large number of files which are entirely downloaded, yet still in "running" status. Part of the problem lay in a slow event loop in sdtaskscheduler.py. But after I fixed that, the problem remained, to a lesser extent. Often "synda watch" output would show a file or two which had been stuck in "running" status for many hours after it had been downloaded. The file's sdget.sh process had completed successfully long ago. The best possibility was that p.communicate() in sdutils.get_status_output() was unable to return. Looking lots of places on the web, I saw how this can happen:

In order for p.communicate() to work, the main process (belonging to the daemon; it doesn't matter that this is in its own thread) has to have open file descriptors for stdout,stderr of the subprocess P1 that it creates with Popen. If it starts up another subprocess P2 (as it does frequently in Synda), the default is for P2 to inherit all its open file descriptors. So P2 has open file descriptors for P1's stdout,stderr. They can't be closed until both P1 and P2 have finished! P2 started later than P1, and sometimes P2 takes hours (especially if it's a transfer from LASG). So, even if P1 just downloads a little file from a fast server, it will sometimes take many hours for its transfer to complete. The solution is simple: set close_fds so that P2 doesn't get those open file descriptors.

pjournou-ipsl commented 3 years ago

From July 2020, there were probably some important changes because, according to the doc (source: https://docs.python.org/fr/3/library/subprocess.html), the close_fds argument of the Popen class is now set to True by default if missing.