Closed hahnjo closed 4 years ago
Note that I'm not sure if ips_proto_mq_eager_complete
is ever called from ips_ptl_mq_rndv
/ ips_proto_mq_push_rts_data
: I've tried to understand the code structure and trigger that code path, but haven't been successful. If that can never happen, the better solution would be to remove the PSM_CUDA
from ips_proto_mq_eager_complete
which just avoids the problem.
Sorry for ignoring this for so long; I was on sabbatical and didn't see it till today. We are reviewing your change and will look at pulling it ASAP.
Will be included in the next PSM2 release.
As far as I can see, this fix is still not in master. Do you actually for people debugging and fixing your stuff?!?
We haven't actually done another release since I said we would include it in the next release...
We haven't actually done another release since I said we would include it in the next release...
So what's a "release"? As a contributor I only see two commits in master that claim to sync with a release.
I see 48 releases on https://github.com/intel/opa-psm2/releases
I see 48 releases on https://github.com/intel/opa-psm2/releases
And https://github.com/intel/opa-psm2/releases/tag/IFS_RELEASE_10_10_1_0_36 is from 11 days ago and doesn't have the fix.
Hahnjo, this isn't the best place for this conversation. Please email me @ michael.william.heinz@intel.com so we can talk this through.
The value is read in ips_proto_mq_eager_complete to determine if prefetched buffers from sendreq_prefetch must be reclaimed. If cuda_hostbuf_used is uninitialized, the value might be different from 0 which leads to crashes if sendreq_prefetch hasn't been set up correctly.
From what I can see there are 4 possible call stacks that lead to that function: 1&2) Via ips_ptl_mq_eager which is called from both ips_protomq{send,isend} if len <= mq->hfi_thresh_rv. Note that this doesn't use prefetch buffers, so I think it's correct to set cuda_hostbuf_used = 0. 3) In ips_proto_mq_isend if len <= flow->frag_size. Again this doesn't use prefetch buffers, so the default for the request in ips_proto_mq_isend can be cuda_hostbuf_used = 0. 4) ips_proto_mq_push_rts_data which is involved if len > mq->hfi_thresh_rv. In that case ips_ptl_mq_rndv correctly sets cuda_hostbuf_used = {0,1}.