facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.22k stars 6.38k forks source link

Why the nllb 3.3B will still occupy 5.7GB memory while the model has loaded to GPU and occupied 13.17GB GPU-memory? #5437

Open micronetboy opened 8 months ago

micronetboy commented 8 months ago

Why the nllb 3.3B will still occupy 5.7GB memory while the model has loaded to GPU and occupied 13.17GB GPU-memory? In my opinion, when the model load to GPU, the memory will be very low.

GPU : Nvidia A100 80G PCIe

My code : model_name = "facebook/nllb-200-3.3B" tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModelForSeq2SeqLM.from_pretrained(model_name, trust_remote_code=True).cuda() translator = pipeline( 'translation', model=model, tokenizer=tokenizer, src_lang=source_lang, tgt_lang=target_lang, max_length=max_length, device=device )