BrikerMan / Kashgari

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.
http://kashgari.readthedocs.io/
Apache License 2.0
2.39k stars 441 forks source link

[Question] Load model to continue training #437

Closed SharpKoi closed 3 years ago

SharpKoi commented 4 years ago

Check List

Thanks for considering to open an issue. Before you submit your issue, please confirm these boxes are checked.

You can post pictures, but if specific text or code is required to reproduce the issue, please provide the text in a plain text format for easy copy/paste.

Environment

Question

請問使用 BiLSTM_CRF_Model.load_model(path) 來讀取已儲存的模型後,若是要繼續訓練,我該以什麼方式compile model? 我試過直接call compile_model() 但loaded model似乎少了一層crf layer,無法從中得取loss和metrics,我也試過先call build_model_arc() 再 compile model,但model似乎會從頭訓練,loss並未接續上一次的訓練的loss,f1-score也是從最低開始成長。

SharpKoi commented 3 years ago

@BrikerMan 開發者您好,我找到造成這個bug的問題了,原因是 ABCTaskModelload_model 這個Method是將載入模型的最後一層CRF Layer指派給 model.layer_crf ,而有CRF Layer的模型代碼中 compile_model() 所獲取的loss是由 model.crf_layer 所提供,導致在調用 compile_model() 時獲取不到loss。 我已fork您的專案並修正此問題,並且加入save model callback功能,想發PR給您,只是不知有何規範?

BrikerMan commented 3 years ago

@SharpKoi 非常感谢您的支持,规范就是带上必要注释,能够通过 lint & test 即可。 本地测试方法

sh ./scripts/lint.sh && sh ./scripts/test.sh

提交 pr 后 github actions 也会进行测试~

Bingyu-Wang commented 3 years ago

请问这个问题解决了吗?我在调用 loaded_model.compile_model() 时依然碰到这个问题

.\kashgari\tasks\labeling\bi_lstm_crf_model.py in compile_model(self, loss, optimizer, metrics, kwargs) 61 kwargs: Any) -> None: 62 if loss is None: -> 63 loss = self.layer_crf.loss 64 if metrics is None: 65 metrics = [self.layer_crf.accuracy] AttributeError: 'NoneType' object has no attribute 'loss'

SharpKoi commented 3 years ago

请问这个问题解决了吗?我在调用 loaded_model.compile_model() 时依然碰到这个问题

.\kashgari\tasks\labeling\bi_lstm_crf_model.py in compile_model(self, loss, optimizer, metrics, kwargs) 61 kwargs: Any) -> None: 62 if loss is None: -> 63 loss = self.layer_crf.loss 64 if metrics is None: 65 metrics = [self.layer_crf.accuracy] AttributeError: 'NoneType' object has no attribute 'loss'

@Bingyu-Wang 您好,非常抱歉晚回覆了。 我將解決方案提交到 v2-dev 這個branch,沒有提交到default branch,現在的release版本沒有這次解決的項目。 這個錯誤起因於model儲存CRF Layer的變數名為crf_layer,而非錯誤訊息裡的layer_crf,您可以選擇執行下列代碼後再compile model。

loaded_model.crf_layer = loaded_model.layer_crf
Bingyu-Wang commented 3 years ago

请问这个问题解决了吗?我在调用 loaded_model.compile_model() 时依然碰到这个问题 .\kashgari\tasks\labeling\bi_lstm_crf_model.py in compile_model(self, loss, optimizer, metrics, kwargs) 61 kwargs: Any) -> None: 62 if loss is None: -> 63 loss = self.layer_crf.loss 64 if metrics is None: 65 metrics = [self.layer_crf.accuracy] AttributeError: 'NoneType' object has no attribute 'loss'

@Bingyu-Wang 您好,非常抱歉晚回覆了。 我將解決方案提交到 v2-dev 這個branch,沒有提交到default branch,現在的release版本沒有這次解決的項目。 這個錯誤起因於model儲存CRF Layer的變數名為crf_layer,而非錯誤訊息裡的layer_crf,您可以選擇執行下列代碼後再compile model。

loaded_model.crf_layer = loaded_model.layer_crf

感謝您的回復,我嘗試過您的方法,但是出現了新問題 AttributeError: 'BiLSTM_CRF_Model' object has no attribute 'layer_crf'

目前似乎已經修復了這個bug,但是在compile_model繼續訓練時,訓練似乎是重新開始的。

SharpKoi commented 3 years ago

@Bingyu-Wang 抱歉我似乎看錯您的問題,您遇到的錯誤正好與我的相反,我的錯誤訊息是 crf_layer is NonType,而非 layer_crf。不知道您安裝的是哪個版本呢? 我建議您安裝最新release的版本v2.0.1,再執行我那行程式碼。

pip install kashgari==2.0.1
Bingyu-Wang commented 3 years ago

@Bingyu-Wang 抱歉我似乎看錯您的問題,您遇到的錯誤正好與我的相反,我的錯誤訊息是 crf_layer is NonType,而非 layer_crf。不知道您安裝的是哪個版本呢? 我建議您安裝最新release的版本v2.0.1,再執行我那行程式碼。

pip install kashgari==2.0.1

我重新安裝了最新版本的kashgari。 如果直接 compile_modle 會出現 \kashgari\tasks\labeling\bi_lstm_crf_model.py", line 63, in compile_model loss = self.crf_layer.loss AttributeError: 'NoneType' object has no attribute 'loss'

加上您提供的那行代碼後,可以使用compile_model,但訓練似乎是從頭開始的。

以下是我寫的測試代碼。

bert_embed = BertEmbedding('H:/Corpus/chinese_L-12_H-768_A-12')
model = BiLSTM_CRF_Model(bert_embed)
loaded_model = BiLSTM_CRF_Model.load_model('saved_ner_model_BIO_3')
loaded_model.crf_layer = loaded_model.layer_crf
loaded_model.compile_model()
model.fit(train_x, train_y, valid_x, valid_y, epochs=1, batch_size=10)
model.save('saved_ner_model_BIO_3')
SharpKoi commented 3 years ago

@Bingyu-Wang 您檢查一下吧,為什麼你載入模型放到loaded_model後卻用新建立的model來訓練

Bingyu-Wang commented 3 years ago

@Bingyu-Wang 您檢查一下吧,為什麼你載入模型放到loaded_model後卻用新建立的model來訓練

感謝您的回答,問題已解決。 我直接參照了示例代碼,Kashgari/docs/tutorial/text-labeling.md,這句正是我疑惑的地方。

import kashgari
from kashgari.tasks.labeling import BiLSTM_Model

model = BiLSTM_Model()
model.fit(train_x, train_y, valid_x, valid_y)

# Evaluate the model

model.evaluate(test_x, test_y)

# Model data will save to `saved_ner_model` folder
model.save('saved_ner_model')

# Load saved model
loaded_model = BiLSTM_Model.load_model('saved_ner_model')
loaded_model.predict(test_x[:10])

# To continue training, compile the newly loaded model first
loaded_model.compile_model()
model.fit(train_x, train_y, valid_x, valid_y)
AnitaSherry commented 2 years ago

@Bingyu-Wang你检查一下吧,为什么你把模型加载到loaded_model之后却用新创建的模型来了

感谢您的回答,问题已解决。 我直接参照了示例代码,Kashgari/docs/tutorial/text-labeling.md,这句话正是我疑惑的地方。

import kashgari
from kashgari.tasks.labeling import BiLSTM_Model

model = BiLSTM_Model()
model.fit(train_x, train_y, valid_x, valid_y)

# Evaluate the model

model.evaluate(test_x, test_y)

# Model data will save to `saved_ner_model` folder
model.save('saved_ner_model')

# Load saved model
loaded_model = BiLSTM_Model.load_model('saved_ner_model')
loaded_model.predict(test_x[:10])

# To continue training, compile the newly loaded model first
loaded_model.compile_model()
model.fit(train_x, train_y, valid_x, valid_y)

解决了吗?我load_model错误,Unknown layer: PositionEmbedding.