Lightning-AI / lightning-thunder

Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
Apache License 2.0
1.15k stars 77 forks source link

import thunder tries to initialize all native executors #345

Open kshitij12345 opened 5 months ago

kshitij12345 commented 5 months ago

import thunder initializes all native executors. I don't think this is desirable as initializing them may not be cheap. Except for default executors, we should load others only when user calls get_executor or get_all_executor.

https://github.com/Lightning-AI/lightning-thunder/blob/843b2c265ec335aa1406f3d5650324567b88bf78/thunder/__init__.py#L78

https://github.com/Lightning-AI/lightning-thunder/blob/843b2c265ec335aa1406f3d5650324567b88bf78/thunder/extend/__init__.py#L382-L388

cc @apaz-cli @carmocca @borda

carmocca commented 5 months ago

get_all_executors makes sure that all the 1st party executors are imported: https://github.com/Lightning-AI/lightning-thunder/blob/b1f447022b0732e83c11661c30746568280834f7/thunder/extend/__init__.py#L354-L366. This is necessary to avoid silently returning None for executors that are importable.

One simple fix could be avoiding get_executor at import time

Still in the long-term, Thunder would benefit from having a cheap way for executors to register themselves in a way that signals "I could be imported". And a different and more expensive check for "I can actually be used (dependencies are available, required hardware is available, ...)"

mruberry commented 5 months ago

triage review — adding a system that lets executors register themselves as "importable" would be a great extensibility point and should avoid actually importing everything at startup

mruberry commented 5 months ago

@ptrblck points out that importing thunder takes significantly longer than importing torch alone; this could be part of that issue

%time import torch
CPU times: user 745 ms, sys: 0 ns, total: 745 ms
Wall time: 774 ms

%time import thunder
CPU times: user 1.57 s, sys: 680 ms, total: 2.25 s
Wall time: 2.28 s

%time import numpy
CPU times: user 1 µs, sys: 1 µs, total: 2 µs
Wall time: 4.53 µs