Beomi / InfiniTransformer

Unofficial PyTorch/🤗Transformers(Gemma/Llama3) implementation of Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
https://arxiv.org/abs/2404.07143
MIT License
336 stars 29 forks source link

Support Zero-3? #27

Open WF0511 opened 3 months ago

WF0511 commented 3 months ago

I used accelerate launch with ZERO-3 to run train.llama.infini.noclm.1Mseq.sh. But I got this: RuntimeError: Function 'LinearFunctionForZeroStage3Backward' returned nan values in its 0th output

Liuzirui666 commented 3 months ago

I have this question too