Open kiya00 opened 3 months ago
Second ref cycle (one of the object here holds onto the user module) -
Repro
def foo():
import torch
import thunder
import weakref
import gc
mod = torch.nn.ReLU()
ref_mod = weakref.ref(mod, lambda _: print("mod deleted!"))
opt_mod = thunder.jit(mod)
ref_opt_mod = weakref.ref(opt_mod, lambda _: print("opt_mod deleted!"))
x = torch.randn(10, 10)
refx = weakref.ref(x, lambda _: print("x deleted!"))
opt_mod(x)
del x
del mod
del opt_mod
# gc.collect()
print("done!") # done!
if ref_mod() is not None:
import refcycle
graph = refcycle.snapshot()
try:
cycle = graph.shortest_cycle(ref_mod())
print("CYCLE FOUND FROM MOD")
except ValueError:
print("NO CYCLE FROM MOD")
pass
# More cycles are found here
for anc in graph.ancestors(ref_mod()):
try:
cycle = graph.shortest_cycle(anc)
print("CYCLE FOUND FROM ANCESTOR")
print(anc)
# Check the cycle from above
# print(anc["prologue"].__wrapped__.__wrapped__.__wrapped__.__globals__["prologue"] is anc["prologue"]) # True
# print(anc["prologue"].__wrapped__.__wrapped__.__wrapped__.__globals__["__function_obj"])
break
except ValueError:
pass
# for obj in cycle:
# print(obj)
# Save the latest cycle.
cycle.export_json("cycle.json")
cycle.export_image("cycle.png")
foo()
So regarding the priority, as discussed in slack: From what I can see, this cycle keeps modules going out of scope from being collected. Not nice, but for the most part, I don't think we will be compiling short-lived modules, so it might not be a game-breaker right now.
I looked at this a bit. In general:
del fn_.__wrapped__
but apparently did not find all of interest, the ref cycle changed but did not go away. I have not been able to select the compile_data._thunder_module_map
.)WDYT?
Not nice, but for the most part, I don't think we will be compiling short-lived modules, so it might not be a game-breaker right now.
It's a game-breaker because it blocks the usage of the Thunder-optimized dropout layer in a larger module as
self.dropout = thunder.jit(nn.Dropout(p=0.5))
It's a game-breaker because it blocks the usage of the Thunder-optimized dropout layer in a larger module as
I would like to understand this more. Is it a game-breaker because you disagree that it is not as relevant for long-lived modules or because you expect the modules to be short-lived?
I'm sorry I confused this issue with https://github.com/Lightning-AI/lightning-thunder/issues/1074. I don't have an important use case for fixing this bug.
Note: If you have a model or program that is not supported yet but should be, please use the program coverage template.
🐛 Bug
To Reproduce
with the line of torch.comile(), it outputs:
with thunder it outputs:
@kshitij12345 detected there's a reference cycle:
cc @apaz-cli