Open micronetboy opened 8 months ago
Why the nllb 3.3B will still occupy 5.7GB memory while the model has loaded to GPU and occupied 13.17GB GPU-memory? In my opinion, when the model load to GPU, the memory will be very low.
GPU : Nvidia A100 80G PCIe
My code : model_name = "facebook/nllb-200-3.3B" tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModelForSeq2SeqLM.from_pretrained(model_name, trust_remote_code=True).cuda() translator = pipeline( 'translation', model=model, tokenizer=tokenizer, src_lang=source_lang, tgt_lang=target_lang, max_length=max_length, device=device )
Why the nllb 3.3B will still occupy 5.7GB memory while the model has loaded to GPU and occupied 13.17GB GPU-memory? In my opinion, when the model load to GPU, the memory will be very low.
GPU : Nvidia A100 80G PCIe
My code : model_name = "facebook/nllb-200-3.3B" tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModelForSeq2SeqLM.from_pretrained(model_name, trust_remote_code=True).cuda() translator = pipeline( 'translation', model=model, tokenizer=tokenizer, src_lang=source_lang, tgt_lang=target_lang, max_length=max_length, device=device )