bclarkson-code / Tricycle

Autograd to GPT-2 completely from scratch
104 stars 7 forks source link

Static allocations #49

Closed bclarkson-code closed 3 months ago

bclarkson-code commented 4 months ago

Currently, intermediate gradients and values are created and destroyed dynamically whenever an operation is performed. To avoid a memory leak, we need to use an ugly hack (the cleanup method) to manually delete everything we are no longer using.

Instead, a neater solution would be to reuse existing objects in every forward and backward pass. We can do this by replacing each operation function with a class that stores objects as attributes. Hopefully this will let us reduce memory usage and make the code a bit more consistent at the same time.