remove jit(fsdp(model)) codepath

Lightning-AI / lightning-thunder

Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.

Apache License 2.0

1.13k stars 70 forks source link

remove jit(fsdp(model)) codepath #1129

Open t-vi opened 1 week ago

t-vi commented 1 week ago

The old codepath is not composable with other transforms, does not offer gathering of state dicts as easily etc.

Removing, of course depends on NVIDIA benchmarking not needing it. I think we (@crcrpar actually) switched a couple of months ago or so.

@mpatel31415 @tfogal @crcrpar @IvanYashchuk wdyt?

nvMelissa commented 1 week ago

Brought up in triage review. Assigned to Tom V.

crcrpar commented 1 week ago

I'd say the removal should be after https://github.com/Lightning-AI/lightning-thunder/issues/1051 is fixed.

t-vi commented 6 days ago

@crcrpar Thank you for pointing that out!

So what kind of delay should we have to be sure the benchmarking works without it?

mpatel31415 commented 6 days ago

We already use the new node, which contains: distributed_first := (self.compile in ("eager", "inductor") or "dynamo" in self.compile):

so from Mixology perspective it should be fine.

crcrpar commented 6 days ago

Just to clarify my take then, I was just conservative as I wasn't super confident if the constraint written in the issue linked in my previous comment would be fixed clearly.