Closed yhp519 closed 1 year ago
Hi,
The batch size seems to be a bit high. I suggest playing around with the parameters specially:
Then you can have a baseline as how changing these will help converging your model
Hi,
The batch size seems to be a bit high. I suggest playing around with the parameters specially:
- .setBatchSize(8)
- .setLr(0.0005)
Then you can have a baseline as how changing these will help converging your model
I tried to set the batchsize and lr to smaller values as you said, but the problem still exists.
2023-03-03 11:33:11.494679: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:301] SavedModel load for tags { serve }; Status: success: OK. Took 113473 microseconds.
Training started - epochs: 50 - learning_rate: 0.005 - batch_size: 4 - training_examples: 94390
Epoch 1/50 - 146.36s - loss: 9490.271 - acc: 0.9105289 - batches: 23598
Epoch 2/50 - 146.36s - loss: 9489.971 - acc: 0.9105289 - batches: 23598
Epoch 3/50 - 146.52s - loss: 9489.971 - acc: 0.9105289 - batches: 23598
Epoch 4/50 - 146.08s - loss: 9489.971 - acc: 0.9105289 - batches: 23598
Epoch 5/50 - 146.07s - loss: 9489.971 - acc: 0.9105289 - batches: 23598
Epoch 6/50 - 145.88s - loss: 9489.971 - acc: 0.9105289 - batches: 23598
Epoch 7/50 - 145.91s - loss: 9489.971 - acc: 0.9105289 - batches: 23598
Epoch 8/50 - 146.08s - loss: 9489.971 - acc: 0.9105289 - batches: 23598
Epoch 9/50 - 146.72s - loss: 9489.971 - acc: 0.9105289 - batches: 23598
Epoch 10/50 - 147.31s - loss: 9489.971 - acc: 0.9105289 - batches: 23598
Epoch 11/50 - 146.32s - loss: 9489.971 - acc: 0.9105289 - batches: 23598
Epoch 12/50 - 146.42s - loss: 9489.971 - acc: 0.9105289 - batches: 23598
Epoch 13/50 - 146.48s - loss: 9489.971 - acc: 0.9105289 - batches: 23598
Epoch 14/50 - 146.84s - loss: 9489.971 - acc: 0.9105289 - batches: 23598
Epoch 15/50 - 146.25s - loss: 9489.971 - acc: 0.9105289 - batches: 23598
Epoch 16/50 - 147.23s - loss: 9489.971 - acc: 0.9105289 - batches: 23598
Epoch 17/50 - 146.57s - loss: 9489.971 - acc: 0.9105289 - batches: 23598
Have you tried to use the model for prediction? I think given the quality of the dataset and the word embeddings this might be the highest it can be. You can switch embeddings just to see how your model converge. (I suggest trying these models just foe your own testing https://nlp.johnsnowlabs.com/models?type=model&task=Embeddings&annotator=BertSentenceEmbeddings&edition=Spark+NLP&language=xx)
First of all, thank you for your patient answer.
I found the problem. There is something wrong with the distribution ratio of 0 and 1 labels in my data. Under normal circumstances, the data with 0 labels accounts for a very small proportion of regular data. I simulated a positive ratio of about 1:9. , may be the cause of the problem, thanks!
This is a great finding! I was actually going to suggest data imbalance, require augmentation, etc. in your training examples. (when it is skewed on some labels compare to others, the model usually stops converging. We use something that doesn't allow overfitting or at least not that fast and it seems it stops learning since there is nothing to learn/challenge)
This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 5 days
When I use sparkNLP to train a Chinese text classification model, both loss and acc remain unchanged, and it takes a long time to load before training the model.
The training log is as follows:
My code:
data2.csv:
label content 0 我的第一个句子 1 我的第二个句子
My env:
spark-nlp==4.3.1 pyspark==3.2.3