Open lantiga opened 1 month ago
I like the proposal in general, a couple of details:
Overall, I like it. I do think that we need to be able to bundle together all the nonsense we do to a model. Although I think that ideally recipes should combine-able and composable? Or at least that was the original goal. Quantizing and then distributing vs distributing and then quantizing should be equivalent, and likewise for the grad transform, etc. If that makes semantic sense. You tell me, I suppose. On the surface it seems like we're trading the issue of non-composable transforms for the issue of non-composable recipes.
I like exposing Lookaside
as a class. I think that de-mystifies things quite a bit. It makes it discoverable in the docs, and passing them along to thunder.jit()
makes a lot of sense. I think it's a core functionality of Thunder, and I don't think it makes sense that it's so buried. Also, like we talked about, I would refer to move stuff out of global context variables. I think this is a good change regardless.
I really like setup_operators()
.
What does setup_config()
do? I understand all the other methods.
I take it that setup()
is not supposed to be overridden? I forget if there's an annotation for that. If not, it might be good to make that explicit. I suppose if somebody wants to do arbitrary things when creating it they can subclass and put it in the __init__
function? And then when they apply it, they can put arbitrary stuff in setup_operators()
?
As long as it actually improves (or at least doesn't worsen) the composability problem, looks dope. I think we've had the problem of the thunder API being scattered across a lot of different methods and packages for a long time now, and I like that this centralizes it. It will make it a lot easier to understand for newcomers.
Thanks for the comments!
Regarding composable recipes: I have thoughts about adding some sort of traits to either recipes (or transforms themselves) that would inform how to compose things together (as in: I need to come after A and B kind of things, and before E and F kind of things).
My ideal sequence would be:
As far as I'm concerned we can stay with 1 for as long as we need to really figure out how recipes will compose, but we know we are not cornered there.
Maybe to make developer experience nicer we could also add a way to quickly add an extra transform or executor to a recipe inline, without subclassing.
setup_config
is for those flags one wants to pass to thunder.jit
(or ThunderFX
depending on the recipe). Not sure it's the best way to do that, we'll see.
Correct, setup()
is not to be overridden, unless you really want to. Maybe we can just rename it to _setup()
or something that makes it clearer that you shouldn't mess with it.
Overall sounds great to me. I have some questions and comments to help myself understand this proposal better.
Q1 -- setup_operators
: Would it even let us register custom executor, like apex executor we already have?
Q2 -- setup_<foo>
: Would they really need to be a member method, instead of staticmethod
?
Thank you @crcrpar
Q1: yes in the case where we want to offer a straightforward way to add a single-operator executor. For anything more complex we should require to define a proper executor.
Q2: the setup methods could rely on properties that we set on the recipe. Like for example, I could have a recipe that takes a use_fsdp=True|False
argument in the constructor, and you would want to access self.use_fsdp
in setup_executors
. So I think they need to be members.
🚀 Feature
Thunder recipes and new high-level entrypoint.
This is important
Motivation
Providing a model to
thunder.jit
requires understanding on:The above is specific to models and cluster configurations, so it would be good to have a way to package everything up in a reusable class, that applies to certain models or model families and can be shipped alongside the original code.
For instance, one could have a
HFLllama3
recipe, or a more generalHFLlama
recipe that can be applied to all variants of Llama. A recipe could expose options for different configurations as well, like the use of distributed. Last, one could have aHFLlama3Hopper
recipe that optimizes the combination of executors for a certain architecture.In a nutshell, the recipe would orchestrate what is needed to make a thunder run on a model and what gets applied to that model. Code that use recipes would not:
thunder.jit
One of the uses for a recipe is when dealing details of implementations and the way we deal with them, e.g.
The possible introduction of a
ThunderFX
entrypoint makes this even more attractive. The recipe could decide to call intothunder.jit
orThunderFX
according to the recipe implementor decides to go with.Pitch
This is how the new entrypoint and a recipe could look like. NOTE: the naming is up for grabs, this is just for demonstration purposes.
Here's a skeleton of a base
ThunderRecipe
class and thethunder.compile
entrypointExample recipe for HFBert
One could think about composing recipes. A basic quantization recipe:
and a composed recipe with configurable quantization
One could also think about specifying lists of recipes but I'm on the fence about it at least initially. We could have rich rules on how to compose, but doing so manually like above is probably better while we're getting a sense for the system.