Open abelavit opened 8 months ago
The model outputs have a field called hidden_states which contain the embeddings. Sth along those lines: embeddings = model(input_ids, attention_mask=attention_mask).hidden_states
For loading the original pre-trained model, such as ProtT5, it can be done so:
Load the tokenizer
tokenizer = T5Tokenizer.from_pretrained('Rostlab/prot_t5_xl_half_uniref50-enc', do_lower_case=False)
Load the model
model = T5EncoderModel.from_pretrained("Rostlab/prot_t5_xl_half_uniref50-enc").to(device)
To load the finetuned model, from the PT5_LoRA_Finetuning_per_residue_class.ipynb script, the command seems to be:
tokenizer, model_reload = load_model("./PT5_secstr_finetuned.pth", num_labels=3, mixed = False)
The load_model in the above calls other functions (e.g. PT5_classification_model function) which leads to having a chunky script. I am wondering if there was a simple way to load the finetuned model and obtain embedding for protein sequences, such as done for the original pre-trained model (ProtT5).
I am not sure if I am doing it right.
Thanks.
I see your point; however, currently we do not have the bandwidth to work on a nicer interface, sorry. In case you should find a nicer way, e.g., by using https://github.com/huggingface/peft , feel free to share or to create a pull request :)
Hello,
I needed help on how to go about generating embedding after ProtT5 has been finetuned. I have carried out finetuning of the model using the sample code 'PT5_LoRA_Finetuning_per_residue_class.ipynb' on my own dataset. I have the saved mode called PT5_secstr_finetuned.pth. How do we now extract embedding for new protein sequences such as sequence_examples = ["PRTEINO", "SEQWENCE"] using the finetuned model?
Thank you for your time.