MesserLab / SLiM

SLiM is a genetically explicit forward simulation software package for population genetics and evolutionary biology. It is highly flexible, with a built-in scripting language, and has a cross-platform graphical modeling environment called SLiMgui.
https://messerlab.org/slim/
GNU General Public License v3.0
160 stars 30 forks source link

Compiling Eidos script. #431

Open iago-lito opened 5 months ago

iago-lito commented 5 months ago

Hello. I am sharing a crazy idea we had this week.

Based on the pieces of advice given in the "performances" section of SLiM manual, I assume that Eidos script lines written in every block/callback/function of a model (or at least some intermediate, pre-parsed representation of them) are interpreted at runtime to figure which internal functions to call, check whether values have correct types, handle the bookkeeping of variables etc.

Although this provides incredible model flexibility, it must come with some overhead which I assume is twofold:

  1. The cost of interpretation itself (typechecking, bookkeeping of variables, dynamic calls resolution etc.), which I have some assumptions about:

    • Easy to measure.
    • Small compared to the time passed within the actual C++ functions "doing the job" (so it's okay).
    • Irreducible: already optimal or at least very near.
    • Predictible and linear: the more Eidos script lines the more overhead.
  2. The cost of missed optimisations. Lexical variables and dynamic calls resolution make it impossible for a compiler to statically analyze the model and inline function calls, prune dead branches, unroll loops, optimize variables+typechecking away etc. I assume that this fraction of the overhead is:

    • Variable and unpredictable, highly nonlinear : overhead depends on how easy it would be for a compiler to optimize a given user model.
    • Difficult to measure, because the performances of a model written in Eidos script should be compared to that of an alternate, implementation of the same model written in a pure low-level language, in a way that depends on the quality of this alternate implementation.
    • Possibly large or very large, especially for models involving numerous Eidos variables/lines and/or tight Eidos loops.

Addressing 2. requires a huge amount of work, which I would break down to the following:

The (experimental) compiler would not need to interfere with the software at all. It would just feed from a simulation model file written in Eidos script, and output an alternate, dedicated, rigid implementation of the same model in some lower level language (C/C++/Rust..). That output could then be compiled down to optimized machine code by existing low-level compilers (gcc/clang/rustc..) depending on user architecture etc.

The work is huge because a e.g. Eidos -> Rust compiler would need to embed a complete, modular representation of SLiM simulation model, so as to be able to generate minimal code corresponding to any given user-defined model. Fortunately, the SLiM simulation model is very well defined to the point that it already has a very precise and rigorous specification : the SLiM manual and the grammar of Eidos script itself <3 These two documents constitute a very strong basis to build such a tool upon IMO, because they mean that SLiM model is already polished, ergonomic, meaningful, and free of the kind of uncertain quirks that would make the writing of a compiler a nightmare. Therefore, my opinion is that, although huge, and if juicy, this work would also be pleasant, instructive and fun.

The first reason I am opening this issue is to share these thoughts with the community here, asking whether there has been a precedent and whether more is known about the possible magnitude of 2.

The second reason is that I would personally be very excited to take a shot at this, as it falls within the crossed scopes of my interest, of the interest of people in my lab, of my experience and of my job. Although I don't have any bandwidth to get into this right now, I'd be happy to save some by 2025 if this turned out to be interesting to you :)

bhaller commented 5 months ago

Hi @iago-lito! Indeed, writing an Eidos compiler has been on my to-do list for years; I just never have the time. :-> But the idea must be in the air, because recently I've been in discussions with a PhD student who might be interested in taking the project on. Unclear, as yet, whether that will happen. If not, it'd be great to have you working on it! (And maybe it's not either/or; maybe there's a fruitful collaboration to be had between this PhD student and you?)

I think it is somewhat more straightforward than you envision, because most of the internals of Eidos, and almost all of the internals of SLiM, should be reasonably easy to turn into a run-time library that compiled Eidos code could link against to produce an executable. So you need an Eidos-to-C++ precompiler that emits C++ code that is designed to link against that runtime library; and then you need to do some refactoring work inside Eidos/SLiM to make it possible for the current code base to produce such a runtime library. (Probably not even as a standalone DLL; just as a way to compile code that can be linked immediately against an Eidos-to-C++ file, in the same compile/link, with no actual externally linked library – just as SLiMgui links against Eidos and SLiM in a single build now.)

The hard part of the work is, I think, generating C++ code that would actually be efficient, because some features of Eidos – dynamic typing and lack of explicit variable declarations, in particular – make it difficult. More thought is needed regarding such problems.

It would also benefit only a fairly small minority of models – those that spend a lot of their time in Eidos script, not in the SLiM core, manipulating values that are singletons or small vectors. Models that live in the SLiM core, and models that spend their time working with large vectors (handled in pure C++ with essentially no overhead already) would see little to no benefit. So given the limited applicability of this idea, it's not really clear that it's worth investing a great deal of work into. Which is why it has stayed on my to-do list for years. :-> But it's certainly a fun idea, and a good project for someone interested in software tool development, and with some computer science chops.

iago-lito commented 5 months ago

Very cool :) Thank you for feedback. I am glad to read that the idea is in the air and not completely crazy, and I can't wait to ask my supervisors how much time I could dedicate to giving this a try next year.

in the same compile/link, with no actual externally linked library

I agree that dynamically linking to some SLiM runtime library may defeat the purpose because there would still be no way for compilers to optimize accross library calls. And indeed, emitting code that would build against SLiM sources could work around this provided the compiler has good LTO support. IIUC the cost is that users should have an up-to-date copy of these sources to compile it, right?

more straightforward than you envision

Indeed, my vision is maybe crazier because I dreamed about emitting standalone code. A bit like tree-sitter does. In any case, better estimate how much performance we could squeeze out of such a compiler before diving in. I would address this with a couple of simple WF + nonWF recipes first. Maybe the result will be that there is no need to even try.

The hard part of the work is, I think, generating C++ code that would actually be efficient, because some features of Eidos – dynamic typing and lack of explicit variable declarations, in particular – make it difficult. More thought is needed regarding such problems.

Yupe. Eidos AST would need to undergo type inference and some sort of ad-hoc type checking, which is a tough nut to crack. Rust and Julia do this for instance. Fortunately, Eidos is already well-defined and well-used, so I assume it's (mostly) free of quirks. That is encouraging, and a strong basis to build on IMO :)

bhaller commented 5 months ago

Very cool :) Thank you for feedback. I am glad to read that the idea is in the air and not completely crazy, and I can't wait to ask my supervisors how much time I could dedicate to giving this a try next year.

:->

in the same compile/link, with no actual externally linked library

I agree that dynamically linking to some SLiM runtime library may defeat the purpose because there would still be no way for compilers to optimize accross library calls. And indeed, emitting code that would build against SLiM sources could work around this provided the compiler has good LTO support. IIUC the cost is that users should have an up-to-date copy of these sources to compile it, right?

Sure, but that's no obstacle really. I'd imagine that users would generate a C++ file with an option to regular command-line slim like:

slim -precompile my_model.slim

and then compile and link the generated C++ file against the same sources that generated the slim build that the user is working with.

more straightforward than you envision

Indeed, my vision is maybe crazier because I dreamed about emitting standalone code. A bit like tree-sitter does. In any case, better estimate how much performance we could squeeze out of such a compiler before diving in. I would address this with a couple of simple WF + nonWF recipes first. Maybe the result will be that there is no need to even try.

The amount of standalone code you'd have to emit would be quite large. Keep in mind that Eidos + SLiM is more than 100,000 lines of code now, IIRC.

A check to see how large the benefit is, before going whole hog into the project, might be a good idea. You'd want to use a test model that is expected to actually see a large speedup, though – as I said, a model that spends most of its time in Eidos, working with singletons and small vectors. That is not true of simple, standard WF/nonWF models; they spend virtually none of their time in the Eidos interpreter, so you'd see no speedup. They're already effectively running in optimized C++ code. And a model that spends most of its time in the Eidos interpreter will clearly get a big speedup – we know that compiled languages are faster than interpreted languages. There's not really much point in testing that. So if there is a utility to doing a test model first, it needs to be thought through carefully – what are you really trying to find out that you don't already know?

The hard part of the work is, I think, generating C++ code that would actually be efficient, because some features of Eidos – dynamic typing and lack of explicit variable declarations, in particular – make it difficult. More thought is needed regarding such problems.

Yupe. Eidos AST would need to undergo type inference and some sort of ad-hoc type checking, which is a tough nut to crack. Rust and Julia do this for instance. Fortunately, Eidos is already well-defined and well-used, so I assume it's (mostly) free of quirks. That is encouraging, and a strong basis to build on IMO :)

There are already big steps in this direction; there's a class called EidosTypeInterpreter that does type inference pretty well. It's used by SLiMgui to power the code completion engine. But there are numerous cases where the type cannot be fully inferred, because Eidos really is a dynamically typed language. It will require thought. The simplest solution might be to add optional type-declaration to Eidos, and require it to be used when precompiling a model.

Just to complicate things, there's an alternative idea that I have also been contemplating for several years, which is implementing a bytecode interpreter for Eidos (https://en.wikipedia.org/wiki/Bytecode). It might be simpler, both for the end user and for the developer; and it might provide quite a bit of the available speedup. I think it is an alternative worth careful consideration here. I don't know much about this possibility; I've never looked into it in detail. But other languages that are interpreted but pretty fast, such as Python, seem to employ it pretty effectively.

iago-lito commented 5 months ago

the cost is that users should have an up-to-date copy of these sources to compile it, right?

Sure, but that's no obstacle really.

Indeed :)

The amount of standalone code you'd have to emit would be quite large. Keep in mind that Eidos + SLiM is more than 100,000 lines of code now, IIRC.

Well, the whole point of emitting dedicated code is that not all these lines are required to run the model specified by, say, recipe 4.1. Had I been willing to run 4.1 without SLiM, and without any flexibility requirement, then I'd have written a much smaller program than SLiM+Eidos. That small program is the baseline which I would like to compare performances against.

standard WF/nonWF models; they spend virtually none of their time in the Eidos interpreter, so you'd see no speedup.

Yupe, if this turns out to be true, and if it covers the majority of use cases, then better not "going whole hog" ^ ^"

There are already big steps in this direction; there's a class called EidosTypeInterpreter that does type inference pretty well. It's used by SLiMgui to power the code completion engine.

Great :D Do you happen to also have good documentation about SLiM/Eidos implementation itself?

But there are numerous cases where the type cannot be fully inferred, because Eidos really is a dynamically typed language. It will require thought.

Yupe. Where inference fails to categorize a variable as int or vector<float>, then code-generated tagged unions like enum { Int(u64), List(Vector<f64>) } can still perform pretty efficiently because only branch elision will be forbidden to optimizers.

The simplest solution might be to add optional type-declaration to Eidos, and require it to be used when precompiling a model.

This is also an option, although I bet it'd increase the language surface by a large factor. BTW has Eidos grammar been written/formalized already? Or is it just specified by the current behaviour of the parser?

implementing a bytecode interpreter for Eidos

From what I understand, the bytecode interpreter would improve performances of executing one Eidos script line, but it would not help optimizing the script as a whole, right?

For example, the following snippet (with a and b large integer vectors):

c = a + b;
d = c - a;

would be executed faster line-by-line with a good bytecode interpreter, but that interpreter would fail to actually replace it by the no-op,l:'5',n:'0',o:'Rust+source+%231',t:'0')),k:50,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:r1760,filters:(b:'0',binary:'1',binaryObject:'1',commentOnly:'0',debugCalls:'1',demangle:'0',directives:'0',execute:'1',intel:'0',libraryCode:'0',trim:'1'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:1,lang:rust,libs:!(),options:'-C+opt-level%3D2',overrides:!(),selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'+rustc+1.76.0+(Editor+%231)',t:'0')),k:50,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4):

d = b;

is that correct?

bhaller commented 5 months ago

The amount of standalone code you'd have to emit would be quite large. Keep in mind that Eidos + SLiM is more than 100,000 lines of code now, IIRC.

Well, the whole point of emitting dedicated code is that not all these lines are required to run the model specified by, say, recipe 4.1. Had I been willing to run 4.1 without SLiM, and without any flexibility requirement, then I'd have written a much smaller program than SLiM+Eidos. That small program is the baseline which I would like to compare performances against.

I don't think that's important, actually. If your program never calls, say, the rdunif() function in Eidos, it doesn't matter at all that the code for rdunif() is in the executable. It won't slow anything down. Stripping away all of the dead code would make your compile time faster, to be sure; but who cares. It wouldn't make Eidos/SLiM run any faster at all, and (to me at least) that's what matters. That is not why compiled code is faster than interpreted code.

standard WF/nonWF models; they spend virtually none of their time in the Eidos interpreter, so you'd see no speedup.

Yupe, if this turns out to be true, and if it covers the majority of use cases, then better not "going whole hog" ^ ^"

It's undoubtedly true. I guarantee it. The question is whether it's worth the work to speed up the minority of models that do spend a lot of their time in the Eidos interpreter – models that do a lot of in-script calculations that don't/can't vectorize down into C++, and models that have callbacks that get called intensively. And perhaps the answer is no, it's not worth the work. So far that has been my own evaluation of it, which is why it hasn't happened. But other people's priorities may well differ from mine. :->

There are already big steps in this direction; there's a class called EidosTypeInterpreter that does type inference pretty well. It's used by SLiMgui to power the code completion engine.

Great :D Do you happen to also have good documentation about SLiM/Eidos implementation itself?

Just comments in the code. I think the code is pretty readable overall – I'm a big believer in well-commented code – but that's me, and the comments in the code are targeted primarily at myself. :->

But there are numerous cases where the type cannot be fully inferred, because Eidos really is a dynamically typed language. It will require thought.

Yupe. Where inference fails to categorize a variable as int or vector<float>, then code-generated tagged unions like enum { Int(u64), List(Vector<f64>) } can still perform pretty efficiently because only branch elision will be forbidden to optimizers.

There are other options there, actually, and a union may well not be the best solution. As I said, it will require thought.

The simplest solution might be to add optional type-declaration to Eidos, and require it to be used when precompiling a model.

This is also an option, although I bet it'd increase the language surface by a large factor. BTW has Eidos grammar been written/formalized already? Or is it just specified by the current behaviour of the parser?

It is a formal grammar. Railroad diagrams are toward the end of the Eidos manual. The same is true for SLiM input files.

implementing a bytecode interpreter for Eidos

From what I understand, the bytecode interpreter would improve performances of executing one Eidos script line, but it would not help optimizing the script as a whole, right?

For example, the following snippet (with a and b large integer vectors):

c = a + b;
d = c - a;

would be executed faster line-by-line with a good bytecode interpreter, but that interpreter would fail to actually replace it by the no-op,l:'5',n:'0',o:'Rust+source+%231',t:'0')),k:50,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:r1760,filters:(b:'0',binary:'1',binaryObject:'1',commentOnly:'0',debugCalls:'1',demangle:'0',directives:'0',execute:'1',intel:'0',libraryCode:'0',trim:'1'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:1,lang:rust,libs:!(),options:'-C+opt-level%3D2',overrides:!(),selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'+rustc+1.76.0+(Editor+%231)',t:'0')),k:50,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4):

d = b;

is that correct?

Sure. You can build an optimizer on top of bytecode, and maybe it's easier to do that than to build it on top of an AST representation (or maybe not); but if you emit C++ you get the C++ compiler's optimizer for free, yes. I'm not sure that's where the big wins are anyway, though, for SLiM's purposes; when a script is taking most of its time in the Eidos interpreter, I doubt that it is typically because of things like this that a high-level optimizer could significantly improve upon. And actually, even a C++ compiler's optimizer would probably fail to optimize that example down to d = b;, because if a and b are large integer vectors, those computations are going to involve loops, not single variables, and the compiler isn't going to understand what the loops are doing well enough to optimize them away. A dedicated optimizer in Eidos would actually have a better chance of understanding such code and optimizing it (but again, I doubt this is much of an issue for real-world SLiM scripts anyway). Going to C++ is not a panacea. But again, remember: the "large vector" case is not the case to aim for anyway. Operations on large vectors spend their time in C++ already, not in the Eidos interpreter.

Anyhow, there's a great deal to think about here, and I have other things I need to get done, so I'm going to pause on this conversation for a while. If you want to delve into this more, that's great; I'd suggest starting by going and looking a lot at how Eidos and SLiM work internally, particularly EidosInterpreter, EidosScript, EidosASTNode, EidosToken, etc. – the internals of the interpreter itself, which is the target here. Take a simple Eidos script that, say, calculates primes or something, think about what C++ code you'd like to emit for that Eidos script, and think about how you would get there – how do you get from A to B. I'd stick to pure Eidos for now, SLiM is really just an Eidos client and the Eidos interpreter is the target here; adding SLiM in to the mix just complicates things unnecessarily, at this point.

iago-lito commented 5 months ago

Agreed on everything :) Thank you for feedback again. Let's pause this until I can pick this up, or someone else does.

iago-lito commented 5 months ago

(for future reference about this point specifically:

actually, even a C++ compiler's optimizer would probably fail to optimize that example down to d = b;, because if a and b are large integer vectors, those computations are going to involve loops, not single variables, and the compiler isn't going to understand what the loops are doing well enough to optimize them away.

here is some fascinating reading about how Julia actually succeeds in doing so: it's called broadcast fusion, and I suspect it's within reach for Eidos provided type inference goes well ;)

bhaller commented 5 months ago

(for future reference about this point specifically:

actually, even a C++ compiler's optimizer would probably fail to optimize that example down to d = b;, because if a and b are large integer vectors, those computations are going to involve loops, not single variables, and the compiler isn't going to understand what the loops are doing well enough to optimize them away.

here is some fascinating reading about how Julia actually succeeds in doing so: it's called broadcast fusion, and I suspect it's within reach for Eidos provided type inference goes well ;)

I look forward to reading that – but I'll just note that Julia has a LOT more people and resources behind it than Eidos does. :->

bhaller commented 5 months ago

here is some fascinating reading about how Julia actually succeeds in doing so: it's called broadcast fusion, and I suspect it's within reach for Eidos provided type inference goes well ;)

Read that article. Interesting reading, but very far from anything that Eidos would be capable of any time soon. I think you are vastly underestimating the amount of work that has gone into Julia's internals to enable these things. See the article's section "Should other languages implement syntactic loop fusion?" That lays out the preconditions for being able to do this; those preconditions are huge, and Eidos has none of them.

I'd suggest keeping our eyes on the prize. Optimizations like this would certainly be nice, but right now, script-intensive SLiM models are probably at least an order of magnitude slower than they could be, for reasons that have nothing to do with this stuff. Interpreting code is slow. That's what we're trying to fix here – not turning Eidos into Julia, which would require an army of coders to achieve.

But interesting article, thanks! :->

iago-lito commented 5 months ago

those preconditions are huge, and Eidos has none of them

Of course. Sorry, I haven't been accurate because I was hesitating between my willingness to "pause discussion" and the excitement of pondering opportunities in the back of my mind. The result is that my point was unclear ^ ^" I also don't want to spoil your time with far-fetched, half-baked, naive ideas, so don't hesitate to just drop me for a while if you need (or even close this issue to limit noise) <3

These are preconditions to integrating broadcasting deeply within the language so that they would feature any user-defined function and would effectively turn Eidos into some kind of Julia. The problem with this is not that we don't have the resource to do so (and hell we don't!). The problem is that Eidos is not even meant to become Julia and it should never be IMO. Had we have an army of developers like them, I would still consider wrong that we attempted to feature the kind of generalized broadcasting described in the article. So maybe we are agreeing more than you think on this point ;)

The strength of Eidos IMO is that it is not general-purpose like Julia, and there is much benefit to its dedication to only simulating genetic models. For instance, one option I was actually seduced with was the one the article dismisses in the "Other partway solutions" section. With a finite, carefully tuned functions library like Eidos has, manually annotating functions as "pure", special-casing array containers and identifying in advance which operations are likely to be applied to vectors is not as meaningless as it is to Julia. It possibly requires less work (although substantial) and it only targets Eidos interpretation speed.

And in any case indeed:

keeping our eyes on the prize

I would be very reluctant to get into so much work for less than an order of magnitude possible gain. This is really all just thoughts until I can measure that.

Also, I appreciate that you set patience against my excitement and focus against my broad thoughts. Thank you again :)

bhaller commented 5 months ago

OK, good for now. Let's pause and think. I need to get SLiM 4.2 out and get ready for SLiM workshops. :->

eirianop commented 5 months ago

hiya @iago-lito pleasure to meet you. Ben invited me to "out" myself as the PhD student potentially interested in this project. I'm definitely interested in collaborating, and love how much you've analysed the challenge already.

iago-lito commented 5 months ago

Hi @eirianop, nice to read you :) As I wrote, I won't have time to tinker about this any time soon, but I would be glad to find some by the end of the year at least just to run basic tests which, if juicy, would help me convince my supervisors to dedicate more next year 0:) I'll be happy to discuss in any case, so don't hesitate to reach out!