Why the nllb 3.3B will still occupy 5.7GB memory while the model has loaded to GPU and occupied 13.17GB GPU-memory？

Why the nllb 3.3B will still occupy 5.7GB memory while the model has loaded to GPU and occupied 13.17GB GPU-memory？ In my opinion, when the model load to GPU, the memory will be very low.

GPU : Nvidia A100 80G PCIe

My code ： model_name = "facebook/nllb-200-3.3B" tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModelForSeq2SeqLM.from_pretrained(model_name, trust_remote_code=True).cuda() translator = pipeline( 'translation', model=model, tokenizer=tokenizer, src_lang=source_lang, tgt_lang=target_lang, max_length=max_length, device=device )

facebookresearch / fairseq

Why the nllb 3.3B will still occupy 5.7GB memory while the model has loaded to GPU and occupied 13.17GB GPU-memory？ #5437