Support the runtime as a dynamic / shared library (only)?

bradcray commented 1 month ago

In a few conversations recently, such as https://github.com/chapel-lang/chapel/issues/26024#issuecomment-2386092070, we've been discussing how Chapel libraries tend to have their own private copy of the runtime bundled in with them, such that if multiple libraries are used, or a Chapel library is used with a Chapel program, we end up with multiple copies of the runtime, each of which would like to own hardware resources (like pinning threads to cores).

This issue asks a question that's come up from time to time over the years: What would it take to structure our runtime as a dynamic / shared library that was loaded independently of any specific Chapel executable or library, such that if multiple things with a process needed the runtime, they'd all share that single library instance rather than each getting and initializing their own?

bradcray commented 1 month ago

@dlongnecke-cray : I filed this based on our conversation earlier this week

mppf commented 1 month ago

IMO the main challenge here is that the runtime uses the ftable (and probably a few other things that are code-generated tables / arrays) just by linking to it.

dlongnecke-cray commented 5 days ago

Thanks for filing this Brad!

I've started thinking about whether it would be possible to remove all symbolic references to chpl_ftable from the runtime code (the runtime code can't reference chpl_ftable at all during compile-time), and I'm a bit stumped.

Suppose we perform a chpl_executeOn: if the runtime does not know about chpl_ftable that means we have to pass the table for the locale we want to execute on in the chpl_executeOn call. Since a chpl_executeOn could be executing on any other locale, that means every locale has to know the local address of the chpl_ftable for the locale it wants to jump to.

OK, not a problem...we can write some code that builds up a privatized table of ftables. [It's a dynamically allocated array of size numLocales, with a copy living on each locale, where each slot 0..<numLocales is the local pointer to the ftable for the locale indexing that slot.]

Except that code itself requires on statements - which call chpl_executeOn, which now needs a ftable pointer passed in, which we don't have...?!

So, after some thinking, maybe we don't need to aim for 0 statically linked references to chpl_ftable in the runtime.

If we assume that dynamic linking/loading is well behaved, then maybe what can happen is:

A) The "origin" Chapel binary loads up
B) The "origin" triggers the Chapel runtime (RT) library to be dynamically loaded (either at startup or the first time it refers to a RT symbol, not exactly sure)
C) We ensure that the RT's symbol chpl_ftable is resolved to be the chpl_ftable from the "origin", on each locale
D) The RT can use its chpl_ftable symbol to bootstrap and insert entries from "origin" into its dynamic pointer cache, or let "origin" build up its privatized table of ftables, or whatever it is we want to do
E) The RT "discards" the chpl_ftable symbol (metaphorically speaking - we never use it again after this bootstrapping)
F) When another Chapel binary "sibling" is loaded, the "origin" and the RT have the ability to run the chpl_executeOn blocks in "origin" that they need to help "sibling" also build up its privatized ftable vector on each locale
G) Now "sibling" also has privatized "ftable tables" and can translate static indices to dynamic using the RT's dynamic pointer cache

Does this approach seem reasonable? It's predicated on my understanding of dynamic linking being correct - but maybe even if my understanding of how symbols are patched in is incorrect it won't matter as long as we ensure the bootstrapping occurs before any other Chapel library ("sibling") is dynamically loaded.

jhh67 commented 5 days ago

Admittedly, I am very late to this conversation, but I have done some work on static and dynamic linking in the past so maybe I can be of use. Is the problem we are trying to solve that I write an application in another language, say Python, and I want to call a couple of libraries written in Chapel, and each of those libraries will have a copy of the Chapel runtime statically linked into them? So we want to make the Chapel runtime a stand-alone library, correct?

dlongnecke-cray commented 5 days ago

Right, I think the Chapel runtime is (rightly) programmed to be as greedy as possible and it probably doesn't really have an awareness of "other Chapel runtimes", nor would it know to play nicely with them. So in a situation where we'd have multiple copies of the runtime loaded up, the overall performance of the system is pretty much guaranteed to be horrible the moment either codebase starts to do some heavily parallel/distributed computation.

If we have only a single dynamically loaded copy of the runtime shared between different Chapel binaries, then the runtime can work to make sure that they all share resources (threads/tasks, memory, files, etc) to achieve the best performance possible.

EDIT: I think in this hypothetical we can imagine that any other binary could be loading a Chapel binary (e.g., a Python library). So it could be a Python library loading up a Chapel binary that itself needs the Chapel runtime. Or it could be a Chapel program that needs to dynamically load another Chapel program (the case I've primarily been thinking about lately).

dlongnecke-cray commented 5 days ago

As an alternative to my idea, @jabraham17 was telling me that it might be possible for the different locales to share their local ftable pointers with each other as "out of band" communication when initial setup is happening, e.g., when the launcher is working to prepare execution of the Program_reals. Does that seem feasible?

jhh67 commented 4 days ago

Do we want to support dynamic linking, or dynamic loading, or both?

Would you elaborate on the chpl_ftable issue? From my perspective, there are quite a few global variables that are shared between the generated code and the runtime. If the runtime is interacting with multiple Chapel libraries then a level of indirection must be added to all of the global variables to make them library-specific. I imagine that each library would register itself with the runtime and be returned a handle that it would use on subsequent interactions with the runtime. This would allow the runtime to distinguish between the libraries, and in this case, which chpl_ftable to use. But I have a feeling that's not what's being discussed here. To answer your specific question, we could share the chpl_ftable pointers out-of-band during initial setup. We already share information that way, e.g., the locales need to know each others' addresses in order to communicate, but they need to communicate to exchange addresses. So we exchange the addresses using an out-of-band allgather. We could do the same for chpl_ftable, although I don't understand why this is necessary.

dlongnecke-cray commented 3 days ago

I think we want to support dynamic linking and dynamic loading. [Just clarifying to make sure I'm not completely off base] By dynamic linking, I mean making the runtime into libchpl.so and then linking the generated Chapel program against it. By dynamic loading, I mean performing Chapel's version of a dlopen() on a Chapel library, then doing whatever steps we need to prepare the Chapel library we just opened to run with the libchpl.so.

I was afraid that there would be more variables than just chpl_ftable. It seems like chpl_ftable is just good first one to focus on. Do you think you could help us get a more comprehensive list? Maybe we could have all those variables be stored in a single struct definition or something to have them be easier to keep track of. Then we could more easily migrate to a registration-based approach.

RE: My chpl_ftable idea and the out-of-band comm, I was imagining that one way to handle the indirection problem for chpl_ftable specifically was just to have the generated Chapel code pass in the ftable pointer to use into the execute_on call. But then you end up with this chicken-and-egg problem where for a given Chapel program, the code on each locale needs to know the ftable addresses for the code on every other locale in order to call execute_on. If I write a little Chapel gadget to do that, the gadget itself uses an on statement...

If there's many extern generated symbols that the runtime refers to, then we can try a different sort of approach with registration and handles.

I think a good next step would be to brainstorm what the registration process could look like. In my head I was imagining that if Chapel library (A) loaded up Chapel library (B), that (A) would be responsible for working with the runtime to register (B). But maybe that doesn't have to be the case at all.

EDIT: I see runtime/include/chplcgfns.h. Is that everything? 🤞 EDIT2: No, most certainly not... 😭

chapel-lang / chapel

Support the runtime as a dynamic / shared library (only)? #26130