NVIDIA-Genomics-Research / GenomeWorks

SDK for GPU accelerated genome assembly and analysis
https://clara-parabricks.github.io/GenomeWorks/
Apache License 2.0
286 stars 76 forks source link

[pygw] temporary fix for intermittent test bug #474

Closed tijyojwad closed 4 years ago

tijyojwad commented 4 years ago

issue appears to be with how we were handling the cuda aligner tests in python. the cleanup wasn't correct, which was leading to out of order destruction in pytest. using pytest fixture ensures the objects are deleted properly after each test.

temporarily resolves intermittent bug in tests

tijyojwad commented 4 years ago

nvm @ahehn-nv @mimaric , bug not fixed yet. false alarm :)

ahehn-nv commented 4 years ago

Another solution is to make python somehow aware of CudaAlignerBatch's reference to the CudaStream, such that the garbage collection cannot delete the CudaStream before the C++ object behind CudaAlignerBatch.

ahehn-nv commented 4 years ago

Thinking about it again, the approach with the CudaStream C++ object will probably not help (but would be good for consistency nevertheless), since python manages all the lifetime and calls to special functions. It would be surprising if it behaved differently just because it is a wrapped C++ object. So we should find a way to keep a reference to CudaStream until all destructors of CudaAlignerBatch have been called.

tijyojwad commented 4 years ago

Yes, this is a temporary fix to the problem where we're explicitly keeping the CudaStream object longer than the CudaAlignerBatch. The solution is to keep the reference for CudaStream in the batch object around till destruction, and we do have some code for that, but it's not done right. The reference counting isn't happening correctly for the CudaStream object, and it its dealloc gets called in arbitrary order in the test failure case. I have some leads for it, will create separate PR to address it. Merging this now.

And agreed #475 will help with consistency, but not his bug.