Closed tijyojwad closed 4 years ago
nvm @ahehn-nv @mimaric , bug not fixed yet. false alarm :)
Another solution is to make python somehow aware of CudaAlignerBatch's reference to the CudaStream, such that the garbage collection cannot delete the CudaStream before the C++ object behind CudaAlignerBatch.
Thinking about it again, the approach with the CudaStream C++ object will probably not help (but would be good for consistency nevertheless), since python manages all the lifetime and calls to special functions. It would be surprising if it behaved differently just because it is a wrapped C++ object. So we should find a way to keep a reference to CudaStream until all destructors of CudaAlignerBatch have been called.
Yes, this is a temporary fix to the problem where we're explicitly keeping the CudaStream object longer than the CudaAlignerBatch. The solution is to keep the reference for CudaStream in the batch object around till destruction, and we do have some code for that, but it's not done right. The reference counting isn't happening correctly for the CudaStream object, and it its dealloc gets called in arbitrary order in the test failure case. I have some leads for it, will create separate PR to address it. Merging this now.
And agreed #475 will help with consistency, but not his bug.
issue appears to be with how we were handling the cuda aligner tests in python. the cleanup wasn't correct, which was leading to out of order destruction in pytest. using pytest fixture ensures the objects are deleted properly after each test.
temporarily resolves intermittent bug in tests