The RAM load in contexts.py likes to kill the process. I successfully got this off the GPUs in the past, but there should be some way of cutting the tensors down to maximally manageable sizes, so that they don't accumulate and bottleneck token scaling.
The RAM load in
contexts.py
likes to kill the process. I successfully got this off the GPUs in the past, but there should be some way of cutting the tensors down to maximally manageable sizes, so that they don't accumulate and bottleneck token scaling.