How to use SLED with custom base models that are not on huggingface?

Mivg / SLED

The official repository for Efficient Long-Text Understanding Using Short-Text Models (Ivgi et al., 2022) paper

MIT License

66 stars 9 forks source link

How to use SLED with custom base models that are not on huggingface? #1

Open mdrpanwar opened 2 years ago

mdrpanwar commented 2 years ago

Hi,

Thanks for releasing the code for SLED.

The README suggests editing the config appropriately to use SLED with other base models from hugging face. However, this only works with hugging face models. Is there a way to interface SLED with other models that are not on hugging face? A description of how to go about that and what code changes (in SLED and in the base model) might be needed would be really helpful.

Thanks!

Mivg commented 2 years ago

Hi @mdrpanwar Thanks for your question. Any model checkpoint that can be loaded with HuggingFace can be used, even if it is not pushed as a model card. However, in case you have a custom model with no AutoClass functionality, it will indeed not work in its current form. Can you please add some details on what you have and try to achieve and I'll try to add support for that?

mdrpanwar commented 2 years ago

Hi @Mivg,

Thanks for replying.

My question and request were the following: The official code of new transformer models is not always released in the form of the Hugging Face models having the AutoClass functionality. So, the current implementation of SLED restricts the direct usage of such base models. I was hoping for a more general implementation that can take in any base model implemented in PyTorch regardless of the AutoClass. Perhaps it will require more work. Is this something you are targetting in near future?

Please feel free to close this issue. I shall get back when I have a more concrete requirement for a specific model.

Thanks.

Mivg commented 2 years ago

Hi @mdrpanwar

Thanks for the details. Sure, that makes sense and there is no reason SLED could not support it. Before I think up a possible solution, I want to be precise on the goal. Is it correct to assume your model is implemented in PyTorch and inherits from PreTrainedModel (part of transformers) but is just not registered to be used as an AutoClass? I.e. you are able to to do model = MyCustomModel(...) and pass it to the trainer as if it was e.g. BART? If so, do you also have a custom config class that inherits from PretrainedConfig? Finally, if the two above are true, does your model support MyCustomModel.from_pretrained('some local checkpoint')?

In any case, supporting the above should be rather straightforward. the other possible solution assuming only the answer to the first question is yes is to support something like SledForConditionalGeneration.wrap_model(backbone_model) and use it instead of the from_pretrained initialization

mdrpanwar commented 2 years ago

Hi @Mivg,

Thank you for your detailed response. It is fine to assume that base models are written in PyTorch. Beyond that, there are two classes of base models: 1. The base model is written using Hugging Face's transformers library. In this case, it is fair to assume that it inherits from PreTrainedModel and the custom config class inherits from PretrainedConfig. However, for wider applicability, we can only assume the former to be true. 2. The base model is not written using transformers library (written only using PyTorch or using some other library e.g. fairseq). In this case, we need to come up with some minimal interface that is expected of the base model such that it can be used under SLED framework.

Ideally, we would like to support both 1 and 2 to be exhaustive; but 1 already covers a large number of possible base models. So, we can start with 1 and gradually support 2 over time if you think it to be a valid use case.

leoribeiro commented 1 year ago

Hello @Mivg, is there any update on this issue? Can I use SLED in other HF models?

@Mivg can I do something like that:

import sled
from transformers import AutoModelForSeq2SeqLM
config = AutoConfig.from_pretrained("google/flan-t5-small")
config["model_type"] = "tau/sled"
config["underlying_config"] = "facebook/bart-base"
config["context_size"] =  256
config["window_fraction"] =  0.5
config["prepend_prefix"] =  true
config["encode_prefix"] =  true
config["sliding_method"] =  "dynamic"

model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-small", config=config)
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-small")

Would this code enable SLED on Flan-T5?

leoribeiro commented 1 year ago

@mdrpanwar please, would you help? Were you able to use SLED with other LMs in HF?