649453932 / Chinese-Text-Classification-Pytorch

中文文本分类,TextCNN,TextRNN,FastText,TextRCNN,BiLSTM_Attention,DPCNN,Transformer,基于pytorch,开箱即用。
MIT License
5.27k stars 1.23k forks source link

大神您好,我想请教一下vocab.pkl文件是怎么得到的 #3

Closed lili1234567890 closed 5 years ago

649453932 commented 5 years ago

这个是词表,看一下utils.py文件中36至40行,若词表不存在,就会通过build_vocab函数自动创建,我这里保存成pkl格式了。

lili1234567890 commented 5 years ago

没有词表时,build_vocab函数不创建,报错 Traceback (most recent call last): File "utils.py", line 134, in word_to_id = pkl.load(open(vocab_dir, 'rb')) FileNotFoundError: [Errno 2] No such file or directory: './THUCNews/data/vocab.pkl'

649453932 commented 5 years ago

提取预训练词向量我忘记加构建词表这个逻辑了...,现在改好了,新的utils.py文件已经上传了

lili1234567890 commented 5 years ago

我没有sgns.sogou.char的字符列表,可以随机生成,不使用吗? Traceback (most recent call last): File "utils.py", line 145, in f = open(pretrain_dir, "r", encoding='UTF-8') FileNotFoundError: [Errno 2] No such file or directory: './THUCNews/data/sgns.sogou.char'

649453932 commented 5 years ago

python run.py --model TextCNN --embedding random

lili1234567890 commented 5 years ago

哦,对哦,我忘了,你的代码是可以的,谢谢

lili1234567890 commented 5 years ago

我能再请教个问题吗?这种错误要怎么改 Traceback (most recent call last): File "run.py", line 42, in dev_iter = build_iterator(dev_data, config) File "/home/zgy/wll/Tibetan-Text-Classification-Pytorch/utils.py", line 120, in build_iterator iter = DatasetIterater(dataset, config.batch_size, config.device) File "/home/zgy/wll/Tibetan-Text-Classification-Pytorch/utils.py", line 80, in init if len(batches) % self.n_batches != 0: ZeroDivisionError: integer division or modulo by zero

649453932 commented 5 years ago

你的n_batches等于0了,检查一下你的数据量和batch_size的大小

lili1234567890 commented 5 years ago

开心,弄好了

lili1234567890 commented 5 years ago

最后再打扰您一下下,为什么运行同样的utils.py,我CNN模型的时候成功了,跑RNN等模型的时候就报以下错误 Traceback (most recent call last): File "run.py", line 40, in vocab, train_data, dev_data, test_data = build_dataset(config, args.word) File "/home/zgy/wll/Tibetan-Text-Classification-Pytorch/utils.py", line 37, in build_dataset vocab = pkl.load(open(config.vocab_path, 'rb')) _pickle.UnpicklingError: invalid load key, '\xff'.

649453932 commented 5 years ago

这个是在调用模型之前执行的,是不是词表文件更改了,你把词表删了再重新生成一次试试?

lili1234567890 commented 5 years ago

我删除了,然后重新生成,还是报同样的错误

649453932 commented 5 years ago

那我也不知道了,我这边没有这种问题。。。

lili1234567890 commented 5 years ago

好吧,我再弄弄吧

649453932 commented 5 years ago

如果实在找不到问题所在,你可以把词表存成txt格式,直接open就行了

lili1234567890 commented 5 years ago

好的呀,我能加你个联系方式吗?在这聊,好像不太好

lili1234567890 commented 5 years ago

15682766780,电话微信同一个,或者QQ1:1335502737