baidu-research / persistent-rnn

Fast Recurrent Networks Library
Apache License 2.0
574 stars 87 forks source link

Potential bug in Synchronizer.get_failed_() #15

Open eamartin opened 7 years ago

eamartin commented 7 years ago

I'm considering adding TensorFlow bindings to persistent-rnn, and I've been reading the codebase for my own understanding.

In https://github.com/baidu-research/persistent-rnn/blob/master/include/prnn/detail/rnn/synchronizer.h#L86

    bool get_failed_() const {
        index_t failed = 0;

        prnn::parallel::CudaRuntimeLibrary::cudaMemcpyAsync(&failed,
            get_barrier_failed_flag(), sizeof(index_t),
            prnn::parallel::CudaRuntimeLibrary::cudaMemcpyDefault, stream_);

        return failed != 0;
    }

It seems to me that this has has a race condition between function return (failed going out of scope) and cudaMemcpyAsync completing. Is this the case? If not, what is causing synchronization of the memcpy before function return? There are several other similar examples to this in Synchronizer.

Thanks for any comment on this "maybe not an issue but just a question about code" issue :)