Open bradcray opened 1 month ago
@dlongnecke-cray : I filed this based on our conversation earlier this week
IMO the main challenge here is that the runtime uses the ftable
(and probably a few other things that are code-generated tables / arrays) just by linking to it.
Thanks for filing this Brad!
I've started thinking about whether it would be possible to remove all symbolic references to chpl_ftable
from the runtime code (the runtime code can't reference chpl_ftable
at all during compile-time), and I'm a bit stumped.
Suppose we perform a chpl_executeOn
: if the runtime does not know about chpl_ftable
that means we have to pass the table for the locale we want to execute on in the chpl_executeOn
call. Since a chpl_executeOn
could be executing on any other locale, that means every locale has to know the local address of the chpl_ftable
for the locale it wants to jump to.
OK, not a problem...we can write some code that builds up a privatized table of ftables. [It's a dynamically allocated array of size numLocales
, with a copy living on each locale, where each slot 0..<numLocales
is the local pointer to the ftable for the locale indexing that slot.]
Except that code itself requires on
statements - which call chpl_executeOn
, which now needs a ftable pointer passed in, which we don't have...?!
So, after some thinking, maybe we don't need to aim for 0
statically linked references to chpl_ftable
in the runtime.
If we assume that dynamic linking/loading is well behaved, then maybe what can happen is:
chpl_ftable
is resolved to be the chpl_ftable
from the "origin", on each localechpl_ftable
symbol to bootstrap and insert entries from "origin" into its dynamic pointer cache, or let "origin" build up its privatized table of ftables, or whatever it is we want to dochpl_ftable
symbol (metaphorically speaking - we never use it again after this bootstrapping)chpl_executeOn
blocks in "origin" that they need to help "sibling" also build up its privatized ftable vector on each localeDoes this approach seem reasonable? It's predicated on my understanding of dynamic linking being correct - but maybe even if my understanding of how symbols are patched in is incorrect it won't matter as long as we ensure the bootstrapping occurs before any other Chapel library ("sibling") is dynamically loaded.
Admittedly, I am very late to this conversation, but I have done some work on static and dynamic linking in the past so maybe I can be of use. Is the problem we are trying to solve that I write an application in another language, say Python, and I want to call a couple of libraries written in Chapel, and each of those libraries will have a copy of the Chapel runtime statically linked into them? So we want to make the Chapel runtime a stand-alone library, correct?
Right, I think the Chapel runtime is (rightly) programmed to be as greedy as possible and it probably doesn't really have an awareness of "other Chapel runtimes", nor would it know to play nicely with them. So in a situation where we'd have multiple copies of the runtime loaded up, the overall performance of the system is pretty much guaranteed to be horrible the moment either codebase starts to do some heavily parallel/distributed computation.
If we have only a single dynamically loaded copy of the runtime shared between different Chapel binaries, then the runtime can work to make sure that they all share resources (threads/tasks, memory, files, etc) to achieve the best performance possible.
EDIT: I think in this hypothetical we can imagine that any other binary could be loading a Chapel binary (e.g., a Python library). So it could be a Python library loading up a Chapel binary that itself needs the Chapel runtime. Or it could be a Chapel program that needs to dynamically load another Chapel program (the case I've primarily been thinking about lately).
As an alternative to my idea, @jabraham17 was telling me that it might be possible for the different locales to share their local ftable pointers with each other as "out of band" communication when initial setup is happening, e.g., when the launcher is working to prepare execution of the Program_reals
. Does that seem feasible?
Do we want to support dynamic linking, or dynamic loading, or both?
Would you elaborate on the chpl_ftable
issue? From my perspective, there are quite a few global variables that are shared between the generated code and the runtime. If the runtime is interacting with multiple Chapel libraries then a level of indirection must be added to all of the global variables to make them library-specific. I imagine that each library would register itself with the runtime and be returned a handle that it would use on subsequent interactions with the runtime. This would allow the runtime to distinguish between the libraries, and in this case, which chpl_ftable
to use. But I have a feeling that's not what's being discussed here. To answer your specific question, we could share the chpl_ftable
pointers out-of-band during initial setup. We already share information that way, e.g., the locales need to know each others' addresses in order to communicate, but they need to communicate to exchange addresses. So we exchange the addresses using an out-of-band allgather. We could do the same for chpl_ftable
, although I don't understand why this is necessary.
I think we want to support dynamic linking and dynamic loading. [Just clarifying to make sure I'm not completely off base] By dynamic linking, I mean making the runtime into libchpl.so
and then linking the generated Chapel program against it. By dynamic loading, I mean performing Chapel's version of a dlopen()
on a Chapel library, then doing whatever steps we need to prepare the Chapel library we just opened to run with the libchpl.so
.
I was afraid that there would be more variables than just chpl_ftable
. It seems like chpl_ftable
is just good first one to focus on. Do you think you could help us get a more comprehensive list? Maybe we could have all those variables be stored in a single struct definition or something to have them be easier to keep track of. Then we could more easily migrate to a registration-based approach.
RE: My chpl_ftable
idea and the out-of-band comm, I was imagining that one way to handle the indirection problem for chpl_ftable
specifically was just to have the generated Chapel code pass in the ftable pointer to use into the execute_on
call. But then you end up with this chicken-and-egg problem where for a given Chapel program, the code on each locale needs to know the ftable addresses for the code on every other locale in order to call execute_on
. If I write a little Chapel gadget to do that, the gadget itself uses an on
statement...
If there's many extern
generated symbols that the runtime refers to, then we can try a different sort of approach with registration and handles.
I think a good next step would be to brainstorm what the registration process could look like. In my head I was imagining that if Chapel library (A) loaded up Chapel library (B), that (A) would be responsible for working with the runtime to register (B). But maybe that doesn't have to be the case at all.
EDIT: I see runtime/include/chplcgfns.h
. Is that everything? 🤞 EDIT2: No, most certainly not... ðŸ˜
In a few conversations recently, such as https://github.com/chapel-lang/chapel/issues/26024#issuecomment-2386092070, we've been discussing how Chapel libraries tend to have their own private copy of the runtime bundled in with them, such that if multiple libraries are used, or a Chapel library is used with a Chapel program, we end up with multiple copies of the runtime, each of which would like to own hardware resources (like pinning threads to cores).
This issue asks a question that's come up from time to time over the years: What would it take to structure our runtime as a dynamic / shared library that was loaded independently of any specific Chapel executable or library, such that if multiple things with a process needed the runtime, they'd all share that single library instance rather than each getting and initializing their own?