PiotrNawrot / nanoT5

Fast & Simple repository for pre-training and fine-tuning T5-style models
Apache License 2.0
957 stars 70 forks source link

Beginner Question : Would it be wise to use this as a backbone for custom seq2seq modeling fMRI data and custom encoder? #33

Closed dyhan316 closed 5 months ago

dyhan316 commented 5 months ago

Hello,

It is my first time using transformers, I wanted to ask a few questions on how I can implement a custom transformer fast with minimal effort.

Do you think I should use your code as a starting point to create a custom model that :

  1. Uses a completely different dataset with custom tokenization schemes (ex : tokenizing brain signals using a custom brain signal embedding model)
  2. Can optionally change either the encoder or the decoder? (for example, I am thinking of using the ContiFormer in lieu of the regular encoder to better capture the "temporal" aspect of the data in question.
  3. Using a very small model (like 1,2 encoder/decoder layers at most), (brain data is very limited)

Or should I just copy the T5 model code from hugging face and try to customize it from there using PyTorch? (I am already familiar with PyTorch (but not transformers or hugging face or etc, as it I only used CNNs )

Any advice would be greatly appreciated :)

PiotrNawrot commented 5 months ago

Hey,

  1. Yes, you can easily swap your dataset in nanoT5.
  2. Yes, in this repository you have access to the low-level model implementation so you can easily change the implementation. It was actually the main motivation behind this repo - to be able to modify the underlying implementation and not to rely on high-level modules too much.
  3. Yes, you can also very easily adapt your model config.
PiotrNawrot commented 5 months ago

Imo this repo is a nice starting point as it has some basic training functionality implemented and exposed. Things you need to do are changing your model and the dataset, but this should be fairly easy. On the other hand, if you were to take the copying from HF route then it would be more work as model implementations from HF are really huge as they implement a lot of extra functionalities that you don't need. In this repo you have roughly minimal implementation of a T5 model which makes it a better starting point imo.