brightmart / text_classification

all kinds of text classification models and more with deep learning
MIT License
7.83k stars 2.57k forks source link

seq2seq attention question_id question_string_list #66

Closed tangdouer closed 6 years ago

tangdouer commented 6 years ago

question_id,question_string_list 针对单标签: ('question_id:', u'w35155 w6 w14436 w96 w347 w88 w26554 w11 w21879 w762 w151213 w72 w13047 w7464 w111') ('question_string_list:', u'label7792886053889220161') ('x_indexed:', [34714]) 针对多标签: ('question_id:', u'w5196 w27993 w32357 w346 w20331 w11 w46391 w10534 w114671 w642 w471 w6009 w111 c1028 c554 c1008 c2521 c431 c889 c583 c17 c50 c2521 c1077 c730 c557 c1770 c622 c518 c419 c531 c652 c184 w1790 w5817 w42559 w1595 w1733 w268 w1501 w6 w5817 w20092 w11 w5817 w20092 w1595 w2667 w987 w269 w7729 w57974 w111 w2460 w25 w11 w642 w471 w12470 w111 w7614 w293098 w111 c480 c17 c1816 c1824 c1796 c480 c1014 c562 c455 c12 c360 c382 c628 c2023 c2648 c184 c1248 c4 c17 c622 c518 c419 c1115 c26 c184 c2061 c2062 c1106 c544 c184') ('question_string_list:', u'label-7503593820279639580 4957478629964985687 -6746100647744083283 -3522198575349379632') ('x_indexed:', [10267, 0, 0, 0])

您好,请问x_indexed是指的标签的编号吗。同时question_id是指内容,question_strng_list是指标签吗,同时指代的标签带字符串_label_吗, label-7503593820279639580能找到x_indexed 而4957478629964985687找不到呢。非常感谢您的帮助。

tangdouer commented 6 years ago

您好,您能提供下zhihu-word2vec-multilabel.bin-100多标签训练出来的这个文件吗。我使用train-zhihu6-title-desc数据集和word2vec训练出一个模型,但是应该和您的文件格式不匹配,导致程序出错。希望您能提供下zhihu-word2vec-multilabel.bin-100这个文件。谢谢您的帮助。

brightmart commented 6 years ago

hi. you can set pertained word embedding flag to false, so it will not use .bin anymore.

if you have some training data, you can use fastText to get a pertained word embedding.