BrikerMan / Kashgari

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.
http://kashgari.readthedocs.io/
Apache License 2.0
2.39k stars 441 forks source link

Why English NER no output ? #393

Closed ybdesire closed 4 years ago

ybdesire commented 4 years ago

Below is my test code for English NER.

The code can run correctly without any error. But it costed 2 hours and still not exit.

import kashgari
from kashgari.embeddings import BertEmbedding
from kashgari.tasks.labeling import BiLSTM_CRF_Model

train_x, train_y = [['this', 'is', 'Jack', 'Ma']],[['O','O','B','I']]
valid_x, valid_y = [['this', 'is', 'Jack', 'Li']],[['O','O','B','I']]
test_x, test_y  = [['this', 'is', 'Jack', 'Zhang']],[['O','O','B','I']]

bert_embed = BertEmbedding('wwm_cased_L-24_H-1024_A-16/',
                           #task=kashgari.LABELING,
                           sequence_length=100)

model = BiLSTM_CRF_Model(bert_embed)
model.fit(train_x,
          train_y,
          x_validate=valid_x,
          y_validate=valid_y,
          epochs=20,
          batch_size=512)

I can get output as blow when code running...

2020-06-19 16:03:01,877 | DEBUG   | ------ Build vocab dict finished, Top 10 token ------
2020-06-19 16:03:01,877 | DEBUG   | Token: [PAD]    -> 0
2020-06-19 16:03:01,877 | DEBUG   | Token: [unused1] -> 1
2020-06-19 16:03:01,877 | DEBUG   | Token: [unused2] -> 2
2020-06-19 16:03:01,877 | DEBUG   | Token: [unused3] -> 3
2020-06-19 16:03:01,877 | DEBUG   | Token: [unused4] -> 4
2020-06-19 16:03:01,877 | DEBUG   | Token: [unused5] -> 5
2020-06-19 16:03:01,877 | DEBUG   | Token: [unused6] -> 6
2020-06-19 16:03:01,877 | DEBUG   | Token: [unused7] -> 7
2020-06-19 16:03:01,877 | DEBUG   | Token: [unused8] -> 8
2020-06-19 16:03:01,877 | DEBUG   | Token: [unused9] -> 9
2020-06-19 16:03:01,878 | DEBUG   | ------ Build vocab dict finished, Top 10 token ------
Preparing text vocab dict: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 9404.27it/s]
2020-06-19 16:03:01,879 | INFO    | ------ Build vocab dict finished, Top 10 token ------
2020-06-19 16:03:01,880 | INFO    | Token: [PAD]    -> 0
2020-06-19 16:03:01,880 | INFO    | Token: [UNK]    -> 1
2020-06-19 16:03:01,880 | INFO    | Token: [CLS]    -> 2
2020-06-19 16:03:01,880 | INFO    | Token: [SEP]    -> 3
2020-06-19 16:03:01,880 | INFO    | ------ Build vocab dict finished, Top 10 token ------
Preparing text vocab dict: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 15709.00it/s]
2020-06-19 16:03:01,880 | INFO    | ------ Build vocab dict finished, Top 10 token ------
2020-06-19 16:03:01,880 | INFO    | Token: [PAD]    -> 0
2020-06-19 16:03:01,880 | INFO    | Token: [UNK]    -> 1
2020-06-19 16:03:01,880 | INFO    | Token: [CLS]    -> 2
2020-06-19 16:03:01,880 | INFO    | Token: [SEP]    -> 3
2020-06-19 16:03:01,880 | INFO    | Token: O        -> 4
2020-06-19 16:03:01,880 | INFO    | Token: B        -> 5
2020-06-19 16:03:01,881 | INFO    | Token: I        -> 6
2020-06-19 16:03:01,881 | INFO    | ------ Build vocab dict finished, Top 10 token ------

Is there any mistake with my code ?

BrikerMan commented 4 years ago

Nope, this must be a bug of 2.x version. Please try the stable version (1.x).

ybdesire commented 4 years ago

Got. Thanks @BrikerMan

BrikerMan commented 4 years ago

Need to fix this before release the 2.0.0 version.

Jefffish09 commented 4 years ago

Hi @BrikerMan ,

我也遇到了类似情况,Kashgari版本:2.0.0a1,处理英语的ner任务,不管是BiLSTM_Model还是CNN_LSTM_Model都停在第一个epoch,等了12个小时还是一动不动,如下图: image

BrikerMan commented 4 years ago

@Jefffish09 可以提供一个 colab demo 么?我排查一下

BrikerMan commented 4 years ago

Hi @BrikerMan ,

我也遇到了类似情况,Kashgari版本:2.0.0a1,处理英语的ner任务,不管是BiLSTM_Model还是CNN_LSTM_Model都停在第一个epoch,等了12个小时还是一动不动,如下图: image

对了,可以尝试一下直接安装 github 上的版本试试看

pip uninstall -y kashgari
pip install git+https://github.com/BrikerMan/Kashgari.git@v2-main
Jefffish09 commented 4 years ago

Hi @BrikerMan ,

我也遇到了类似情况,Kashgari版本:2.0.0a1,处理英语的ner任务,不管是BiLSTM_Model还是CNN_LSTM_Model都停在第一个epoch,等了12个小时还是一动不动,如下图:

image

对了,可以尝试一下直接安装 github 上的版本试试看


pip uninstall -y kashgari

pip install git+https://github.com/BrikerMan/Kashgari.git@v2-main

好的,我先试一下,谢谢! PS:补充一点,后面我换到kashgari 1.x版进行尝试,数据处理、模型参数都没改变,发现一切很顺利地训练出model。估计是2.x版的bug可能性比较大。

BrikerMan commented 4 years ago

@Jefffish09 Kashgari 2.x 还在处于测试阶段,所以可能还是有些 bug。

Jefffish09 commented 4 years ago

Hi @BrikerMan ,

我看到2.0发布了正式版,再用上面例子试了一下,还是一直卡住很久(大概15分钟,1.x版也不需要这么长时间),没有结果,也没有报错。

其中bert embedding用的是wwm_cased_L-24_H-1024_A-16,可以在colab复现:

!pip install kashgari
import kashgari
from kashgari.embeddings import BertEmbedding
from kashgari.tasks.labeling import BiLSTM_CRF_Model

train_x, train_y = [['this', 'is', 'Jack', 'Ma']],[['O','O','B','I']]
valid_x, valid_y = [['this', 'is', 'Jack', 'Li']],[['O','O','B','I']]
test_x, test_y  = [['this', 'is', 'Jack', 'Zhang']],[['O','O','B','I']]

bert_embed = BertEmbedding(bert_embedding,
                           #task=kashgari.LABELING,
                           sequence_length=100)

model = BiLSTM_CRF_Model(bert_embed)
model.fit(train_x,
          train_y,
          x_validate=valid_x,
          y_validate=valid_y,
          epochs=20,
          batch_size=512)

image

BrikerMan commented 4 years ago

@Jefffish09 定位到问题了,当 x, y 样本数量小于 batch_size 的话,generator 会出现无限循环,导致这个问题。下面的代码就可以正常执行,不过正式训练应该不会出现这个问题吧?我更新一个小版本解决一下这个问题。

import kashgari
from kashgari.embeddings import BareEmbedding
from kashgari.tasks.labeling import BiLSTM_CRF_Model

train_x, train_y = [['this', 'is', 'Jack', 'Ma']],[['O','O','B','I']]
valid_x, valid_y = [['this', 'is', 'Jack', 'Li']],[['O','O','B','I']]
test_x, test_y  = [['this', 'is', 'Jack', 'Zhang']],[['O','O','B','I']]

bert_embed = BareEmbedding(sequence_length=100)

train_x = train_x * 600
train_y = train_y * 600

valid_x = valid_x * 600
valid_y = valid_y * 600

test_x = test_x * 600
test_y = test_y * 600

model = BiLSTM_CRF_Model(bert_embed)
model.fit(train_x,
          train_y,
          x_validate=valid_x,
          y_validate=valid_y,
          epochs=20,
          batch_size=512)
Jefffish09 commented 4 years ago

@Jefffish09 定位到问题了,当 x, y 样本数量小于 batch_size 的话,generator 会出现无限循环,导致这个问题。下面的代码就可以正常执行,不过正式训练应该不会出现这个问题吧?我更新一个小版本解决一下这个问题。

import kashgari
from kashgari.embeddings import BareEmbedding
from kashgari.tasks.labeling import BiLSTM_CRF_Model

train_x, train_y = [['this', 'is', 'Jack', 'Ma']],[['O','O','B','I']]
valid_x, valid_y = [['this', 'is', 'Jack', 'Li']],[['O','O','B','I']]
test_x, test_y  = [['this', 'is', 'Jack', 'Zhang']],[['O','O','B','I']]

bert_embed = BareEmbedding(sequence_length=100)

train_x = train_x * 600
train_y = train_y * 600

valid_x = valid_x * 600
valid_y = valid_y * 600

test_x = test_x * 600
test_y = test_y * 600

model = BiLSTM_CRF_Model(bert_embed)
model.fit(train_x,
          train_y,
          x_validate=valid_x,
          y_validate=valid_y,
          epochs=20,
          batch_size=512)

原来如此,但我有另一批数据,在1.x版跑得很正常,换成2.x版也是卡住,数据量不算太小,哪怕把batch_size设成8,也一样卡住,不过因为数据隐私问题,不太方便分享出来。

BrikerMan commented 4 years ago

原来如此,但我有另一批数据,在1.x版跑得很正常,换成2.x版也是卡住,数据量不算太小,哪怕把batch_size设成8,也一样卡住,不过因为数据敏感问题,不太方便分享出来。

不需要分享具体的数据,直接用随机字符串构建成和你的数据一样数量的样本,然后测试看看。目测可能在某些情况下出现了无限循环。

Jefffish09 commented 4 years ago

原来如此,但我有另一批数据,在1.x版跑得很正常,换成2.x版也是卡住,数据量不算太小,哪怕把batch_size设成8,也一样卡住,不过因为数据敏感问题,不太方便分享出来。

不需要分享具体的数据,直接用随机字符串构建成和你的数据一样数量的样本,然后测试看看。目测可能在某些情况下出现了无限循环。

嗯嗯,等下一版看看~

BrikerMan commented 4 years ago

不需要分享具体的数据,直接用随机字符串构建成和你的数据一样数量的样本,然后测试看看。目测可能在某些情况下出现了无限循环。

可以帮我看看出现这个问题的 sample 数量和 batchsize,目前我只解决了 sample < batch_size 的情况~

BrikerMan commented 4 years ago

原来如此,但我有另一批数据,在1.x版跑得很正常,换成2.x版也是卡住,数据量不算太小,哪怕把batch_size设成8,也一样卡住,不过因为数据敏感问题,不太方便分享出来。

不需要分享具体的数据,直接用随机字符串构建成和你的数据一样数量的样本,然后测试看看。目测可能在某些情况下出现了无限循环。

嗯嗯,等下一版看看~

可以通过 pip install git+https://github.com/BrikerMan/Kashgari.git@v2-dev 试试。

Jefffish09 commented 4 years ago

原来如此,但我有另一批数据,在1.x版跑得很正常,换成2.x版也是卡住,数据量不算太小,哪怕把batch_size设成8,也一样卡住,不过因为数据敏感问题,不太方便分享出来。

不需要分享具体的数据,直接用随机字符串构建成和你的数据一样数量的样本,然后测试看看。目测可能在某些情况下出现了无限循环。

嗯嗯,等下一版看看~

可以通过 pip install git+https://github.com/BrikerMan/Kashgari.git@v2-dev 试试。

好的,下次尝试一下,感谢!