Unofficial PyTorch/🤗Transformers(Gemma/Llama3) implementation of Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
I used accelerate launch with ZERO-3 to run train.llama.infini.noclm.1Mseq.sh.
But I got this:
RuntimeError: Function 'LinearFunctionForZeroStage3Backward' returned nan values in its 0th output
I used accelerate launch with ZERO-3 to run train.llama.infini.noclm.1Mseq.sh. But I got this: RuntimeError: Function 'LinearFunctionForZeroStage3Backward' returned nan values in its 0th output