a few questions - Githubissues

shainaraza commented 3 years ago

@gmihaila Hi, this is such as amazing notebook (pretrain_transformers_pytorch.ipynb). This is not an issue just few quick questions for my knowledge:

(i) can I use more evaluation metrics here? (ii) when using the checkpoint, should it be the last one? or the notebook saving the best one? (iii) if my dataset has multiple features, I would need to pass all of these pieces of information as one piece during fine-tuning? if I intend to have side information, which part should I change? any thoughts (iv) if I need to include timestamp, any thoughts? or may be during training, we arrange all data temporally? (v) How can I use the notebook for multi-class @labels

thanks in advance and for the wonderful work that you share with the community

gmihaila commented 3 years ago

Thank you @shainaraza for your interest in my tutorial!

(i) can I use more evaluation metrics here? You sure can. In this tutorial I am using the HuggingFace trainer functionality. You can look at the documentation here - compute_metrics parameter. That is where you can add any custom evaluation metric.

(ii) when using the checkpoint, should it be the last one? or the notebook saving the best one? The notebook is saving the last checkpoint. This is the way I usually use it. Depending on your project you can save checkpoints along the way and check the metrics and figure out which checkpoint is better to use.

(iii) if my dataset has multiple features, I would need to pass all of these pieces of information as one piece during fine-tuning? if I intend to have side information, which part should I change? any thoughts This notebook is doing pretraining on transformers models. That means that is training the transformers models same way as the authors of these model trained them from scratch. With this tutorial I'm showing how to do extended training on these transformers models with your custom dataset. If you're looking at just fine-tuning the model on a specific task like classification check out my fine-tuning tutorial. If you plan on adding any specific information during fine-tuning that is very possible, it would require some extra coding. Not sure what exactly you want to do, but it is doable.

(iv) if I need to include timestamp, any thoughts? or may be during training, we arrange all data temporally? Not sure what do you mean by this. If you're asking how to keep the order of the data the same during training by default the dataloader keeps same order of the data.

(v) How can I use the notebook for multi-class @labels Like I mentioned before I think you're interested in plain fine-tuning and not pretraining. For this check out my fine-tuning tutorial where you can easily change the number of labels to what ever number of labels you have.

Note: The pretrain_transformers_pytorch.ipynb is used for extended training of transformers models on your custom dataset. For example if you have a lot of data that is in a specific topic where Bert was not trained before, you can use this code to extend training for Bert so that it trains on your custom dataset. After this pretraining process you can fine-tune the new pretrained Bert model on what task you need. This way you have your custom pretrained Bert model that works better on your dataset. If you only need to fine-tune any existing pre-trained transformer models you can use my fine-tuning tutorial.

shainaraza commented 3 years ago

@gmihaila thanks you for the clear answers, I am also using yours fine-tuning notebook. This is a lot of explanation in yours code so everything is very easy. Thanks you once again

gmihaila / ml_things

a few questions #3