Closed parmeet closed 3 years ago
Thanks @parmeet. This is indeed the key aspect I would like to clarify and explore with you because it's mainly Text that needs this functionality.
All options you mentioned are valid and need to be discussed. To summarize:
preprocess = weights.transforms(url="...")
with even having a default URL available.My very firs thoughts around this is that I would favour option 1 because to my understanding the weights need to be always used along with the tokenizer they were produced. As far as I understand (please correct me if I'm wrong), changing the tokenizer on a pre-trained model without further training can cause severe accuracy deterioration.
I would love to hear your thoughts on this. Perhaps have an offline chat and then summarize our discussion here. It would be awesome also to have an example of such a model in this RFC so that we can properly explore it. How would you feel about taking ownership of the Text examples and sending PRs that explore different options? To me what we currently discuss is the main big question I have for this proposal.
Thanks @parmeet. This is indeed the key aspect I would like to clarify and explore with you because it's mainly Text that needs this functionality.
All options you mentioned are valid and need to be discussed. To summarize:
- We can assume that the weights are associated with 2 URLs. The first one is the one for the weights and the second one for the tokenizer. In this case both of them would be hardcoded on the enum. Though the styles of this are unclear at the moment.
Absolutely, this is indeed the case. They are tightly coupled. It's not optional, it the primary need :).
- Pass this as argument to the constructor of the transforms mechanism. Something like
preprocess = weights.transforms(url="...")
with even having a default URL available.
This is not an option as we would need to go with 1. But since between different versions of the model (BASE, LARGE etc), the functional form of transforms are same except the URLs, we could potential create partial functions as Callable transforms with hard-coded URLs. This is equivalent of what you have in resnet partial(ImageNetEval, crop_size=224, interpolation=InterpolationMode.BICUBIC)
except that instead of crop_size and interpolation, we would have harcoded URLs for tokenizer and vocab. Hope it clarifies the use case?
My very firs thoughts around this is that I would favour option 1 because to my understanding the weights need to be always used along with the tokenizer they were produced. As far as I understand (please correct me if I'm wrong), changing the tokenizer on a pre-trained model without further training can cause severe accuracy deterioration.
I would love to hear your thoughts on this. Perhaps have an offline chat and then summarize our discussion here. It would be awesome also to have an example of such a model in this RFC so that we can properly explore it. How would you feel about taking ownership of the Text examples and sending PRs that explore different options? To me what we currently discuss is the main big question I have for this proposal.
Yes agreed.
So it seems that this use-case is covered AS-IS by the proposal and the implementation details are as you described above.
I will close the issue to keep track things clean but if there is more into it please reopen and let's chat more.
Taking text as example, the transforms may need to download meta-data for it's construction. For example, XLMR transforms requires spm tokenizer model and corresponding vocab to create xlmr preset transform. What's the recommended way to store those URLs? Should we create a callable that is generic i'e accepts URLs to spm model and vocab and create partial function inside each Enum object, with dedicated URLs for corresponding model preset transform? This way the initialization could be lazy.