amaiya / ktrain

ktrain is a Python library that makes deep learning and AI more accessible and easier to apply
Apache License 2.0
1.23k stars 268 forks source link

fine tune mBERT for sentiment analysis using ktrain #491

Closed mcormick123 closed 1 year ago

mcormick123 commented 1 year ago

Hi, thanks for your great work! I've been using ktrain to fine tune multilingual bert and got an amazing result! i've looked on several tutorials already but some are vague, and i really want to understand more how ktrain works.

Would like to ask if i'm doing the process correctly to fine tune the mBERT from huggingface using ktrain? Here's my process:

  1. Load dataset file which contains text and label
  2. I've read a blog that suggested to get the 95th percentile to get the maxlen, I thought you should tokenize it first and get the maxlen from there? What's the default in ktrain ?
  3. Split the train, val, and test sets
  4. Execute:
    MODEL_NAME = 'bert-base-multilingual-uncased'
    // in setting the maxlen here, i used the 95th percentile
    t = text.Transformer(MODEL_NAME, maxlen = 250, class_names = ['0','1','2'])
    train_final = t.preprocess_train(X_tr, y_tr, mode='train')
    val_final = t.preprocess_test(X_val, y_val)
    model = t.get_classifier()
    learner = ktrain.get_learner(model, train_data=train_final, val_data=val_final, batch_size=16)
    learner.fit_onecycle(2e-5, 2)
  5. Use the learner to predict the test set and evaluate its accuracy and f1 score
  6. Use the learner to predict unseen data. Should I preprocess first the data? or is it good to go to the predictor?

Please, actually, i'm about to report this to my professor as a class output and imagining my prof questioning me stuffs about my code terrifies me already HAHA! explanation would be fully appreciated! thank you so much!

amaiya commented 1 year ago

Hello: The default maxlen is 400 I believe, but running it once to observe the stats and then running it again choosing 250 (which you said was 95% percentile) sounds reasonable.

To predict on unseen data in an application, a Predictor instance is typically used. To evaluate the test set, you can also use the Learner instance instead of the Predictor.