Open mhoemmen opened 6 years ago
We might expect this to be fast, but since valarray containers are initialized automatically and automatically allocated on the master thread's memory, we find that it is actually quite slow even when we have more than one thread.
This is only true if threads touch pages from more than one NUMA domain. Such a code will perform just fine on IBM Blue Gene/Q or Intel Core i7 processors.
@jeffhammond I forget who wrote that section -- it would help to have some more details on how they ran the experiment. We can always rerun it at some point. My main concern for now is deciding on the lifetime of these resource thingies :D
I like this. This could even solve the problem of some resources only being available inside a parallel region (like a GPU). Topology is a snapshot of everything reachable from the root.
Yes, I think that could be a nice way to represent that. We could say that when you perform topology discovery you returned a topology structure of everything which can be executed from in the current thread of execution. So something like GPU device-side enqueue could be represented by an entirely different execution resource/execution context which can be discovered and constructed within a GPU kernel.
We continued discussing this during the last call and we decided that we could potentially support the idea of execution resources only being available within a particular parallel region, by having this_thread::resource()
also be permitted to perform some topology discovery and return resources that were not previously available.
Was that in the notes? I'd like to learn more about that. (Sorry I missed that discussion at Rapperswil.) I'm not sure, without knowing more at least, whether that would solve the problems we've discussed here.
This was mostly just a brain dump, I've not put a great deal of thought into that approach yet, so I'm not sure if that would really help us hear much, thinking about it further I'm tempted to agree that it likely won't. No problem, I thought there were notes taken at the Rapperswil meeting, perhaps not, I'll have a look for them.
This is what I had in the notes that I took:
- It was suggested that we could observe the topology through some kind of visitor pattern (in order of execution or dispatch).
Essentially the suggestion was that rather than having users have to traverse the topology manually we could provide some kind of visitor interface which allows users to specify what they want to find and how they want to represent it and have the implementation do everything for you. We have a similar concept in SYCL, though this is for a smaller domain that we are aiming to support here and I can see some potential difficulties in making this generic enough to be useful. Though it could be a useful higher-level interface for those who don't want to do things manually.
Why not just attempt to create the context? If the resource is no longer available, the context creation fails. Otherwise, the context assumes the responsibility for keeping the resource alive or otherwise handling the case where the resource ceases to be alive at some point.
Yeah, that makes sense to me.
TODO need to make topology discovery race free, with respect to different (CPU) threads.
Update resource lifetime discussion in the Affinity proposal, based on https://github.com/codeplaysoftware/standards-proposals/issues/67 discussion.
NOTE: I have not changed the rest of the proposal to make it consistent with these changes. If people like this pull request, I will then update the rest of the proposal accordingly.