Open georgeamccarthy opened 3 years ago
Not sure if gonna merge this but needing it on GCP without the Dockerization merged. Could probs use a simpler model files structure https://huggingface.co/Rostlab/prot_bert/tree/main
There may be a simpler to get around the issue. If I try and download the model with a simple script
from transformers import BertModel, BertTokenizer
model_path = "Rostlab/prot_bert"
print("Loading tokenizer.")
tokenizer = BertTokenizer.from_pretrained(model_path, do_lower_case=False)
print("loading model.")
model = BertModel.from_pretrained(model_path)
self.tokenizer = tokenizer
self.model = model
print("Done.")
then the system runs out of RAM ~1 GB and throws an error Killed
.
To monitor RAM usage ps -m -o %cpu,%mem,command
Instead of downloading the repo I might just be able to configure the download to use a disk cache.
PR type
Purpose
Why?
Extra info
New
protein_search/models
directory to store models in.Models downloaded from huggingface and then I moved them into these dirs.
Feedback required over
Mentions
References
Legal