Open s373nZ opened 6 days ago
Refactored the file closing operations into a function on TailableProc
and call it from the bitcoind
fixture. Plus, another successful CI run would be nice...
Rebased and removed modifications to CHANGELOG for Changelog-Fixed:
hint in commit message.
@s373nZ Sorry if I overlooked it while jumping between 7669, 7108, 7130, and 7310 :). But could you please remind why the cleanup process is not added to the finalizer or the stop function?
I am testing it with:
make pytest
ps aux | grep pytest
lsof -p ${PID} | wc -l
And it seems to be working (tests are still running at 32% :-|).
@ShahanaFarooqui That's a lot of jumping (and a lot of 7s :). I should have done a better job at consolidating this information into the PR description.
For some context, I ran into this file descriptor issue a few months ago when working on a PR and attempting to run the make pytest
locally. The leak produced seemingly random test failures that weren't alleviated until I lifted the ulimit
. New to CLN development, I was troubleshooting the issue then under the assumption that the LightningD
subprocess was responsible, and tried a wide variety of fixes. Returning to the issue recently in hopes that a fix might stabilize the CI tests somewhat, fresh eyes revealed it is actually the BitcoinD
process.
To address your questions directly:
could you please remind why the cleanup process is not added to the https://github.com/ElementsProject/lightning/issues/7130#issuecomment-1983465331
My reading of the Pytest documentation suggested by @cdecker in that comment, specifically the preceding section titled Teardown/Cleanup (AKA Fixture finalization), suggests that yield
fixtures are recommended and Any teardown code for that fixture is placed after the yield. The bitcoind
fixture is already implemented as a yield
fixture, so closing the files as a cleanup operation performed after the yield
statement in the current implementation aligns with best practice recommendations as I understand it.
The following section in the docs, Adding finalizers directly reads to apply typically to non-yield fixtures. Seeing as this is actually an issue with every test case which leverages the bitcoind
fixture (each leaves four fds open in a (deleted)
state throughout the entire pytest
suite run), I understand the file closing operations to be necessarily bundled with the yield fixture. I think at the time, @cdecker and I thought that the leak might have affected a subset of test cases and were exploring addfinalizer()
as a targeted solution.
In short, the implementation in this PR is indeed using finalizers, but in accordance with yield fixtures rather than addfinalizer
.
...or the stop function?
This is a good question, and one I tried to implement initially. TailableProc
is inherited by both BitcoinD
and LightningD
. There are a number of existing tests which stop()
and start()
the Lightning nodes, and continue to make use of the files throughout the tests to validate log messages. If we close these files when the daemon is stopped, these tests break. Additionally, there are a number of global "teardown checks" in the node_factory
fixture itself which depend on the log files to remain open. After trying a wide variety of implementations using stop()
and even moving the file open()
calls out of the initializer into start()
, it became apparent that this approach was too invasive with the current test suite.
Now that it's discovered BitcoinD
is the leak culprit, there is still a small mystery as to why LightningD
isn't leaking file descriptors in the same manner. My best understanding at the moment is that its daemon is started in a subprocess using Popen()
here, and according to the docs:
If close_fds is true, all file descriptors except 0, 1 and 2 will be closed before the child process is executed. Otherwise when close_fds is false, file descriptors obey their inheritable flag as described in Inheritance of File Descriptors.
Changed in version 3.2: The default for close_fds was changed from False to what is described above.
So, I believe the subprocess is taking care of closing LightningD
's file descriptors by default, but BitcoinD
doesn't get the same treatment.
While closing the files in stop()
doesn't seem to be the right solution, there may be other options. Another idea was to implement it in the TailableProc
destructor __del__()
, but I'm pretty sure I tried that last March and didn't get it working.
Also, I considered the name cleanup()
for the callback function in TailableProc
, which is a more general name. However, that function would be implicitly overridden by LightningD
here when called by LightningNode
here. This could probably work, too, but it was preferable to avoid unexpected behavior by injecting assumptions into the existing inheritance hierarchy.
Hope this helps! I haven't mastered Pytest, Pyln, or the CLN test suite by any means, so please let me know if you have recommendations for a better fit solution.
Also, I have one tangential question, if you would. When referencing issues in the git commit messages, I've been using the Github issue number that describes the problem both in the first line of the commit message and in the Changelog-*:
value. Should the number reference the PR instead? If so, let me know and I'll amend my commit messages.
Also, I have one tangential question, if you would. When referencing issues in the git commit messages, I've been using the Github issue number that describes the problem both in the first line of the commit message and in the
Changelog-*:
value. Should the number reference the PR instead? If so, let me know and I'll amend my commit messages.
Typically, including the issue number in commit messages is useful for tracking purposes. It is recommended to reference issues using related
, fixes
, or closes
in the PR description as well.
Before each release, the changelog.py script generates entries based on commits merged after the last tag.
For instance, the script will produce below entry for your commit message, "Changelog-Fixed: pyln-testing: Fix file descriptor leak in bitcoind fixture.":
...
### Fixed
- pyln-testing: Fix file descriptor leak in bitcoind fixture. ([#7669])
...
[#7669]: https://github.com/ElementsProject/lightning/pull/7669
This way, your changes are automatically included in the appropriate section of the changelog.
ACK e35e3c540b60fab1a626ca9df1b6d07225c154c0.
Fix BitcoinD FD leak in pyln-testing
Description
This PR adds a post-yield resolver sanity-check to the
bitcoind
test fixture to be sure file descriptors are not leaking.Related Issues
Changes Made
Checklist
Ensure the following tasks are completed before submitting the PR:
TODOs
have been addressed or removed.Additional Notes
Please let me know if there is a more elegant way to do this, or more appropriate place to put it.