Open parthmannan opened 5 months ago
@kshitij12345 would you think https://github.com/Lightning-AI/lightning-thunder/blob/21a222b180009616a4cc48176958b4506894a330/thunder/core/transforms.py#L468 would let us write a simple callback that just saves the given traces?
I see three issues using add_post_optimization_transform
prologue_trace
- so we won't be able to save it.forward
and backward
trace independently (but we don't explicitly say if given trace is forward or backward). We can probably derive it from trace
signature but I don't think it is a good idea.post_optimization_transform
s, user will have to make sure that this saving transform would be last, otherwise, it would miss saving information from other transforms which were applied after this one.Also, using add_post_optimization_transform
would still require some changes to training code.
🚀 Feature
An environment variable that dumps out the various Thunder provided debug traces to a log file. This can have variable levels like
export THUNDER_DEBUG=<option>
This is a narrow example of the possible debug log levels. Each of these logs can be in a different log file.
Motivation
To get the trace and other debugging information today, we need to add code that captures the trace and prints it after running a model iteration with the inputs.
model.train()
but editing the iteration loop can be difficult.cc - @mruberry
cc @carmocca @apaz-cli