Open masakuri opened 6 years ago
I'm sorry, I typed incorrect command. ~The error was solved.~ I still have same error...
Ok, please let me know your command.
$ deep-crf train input_train_jp.txt --delimiter=" " --dev_file input_dev_jp.txt --save_dir save_jpmodel_dir --save_name bilstm-cnn-crf_adam_jp --optimizer adam --word_emb_file jp_word_emb300.txt --word_emb_vocab_type replace_only --gpu 0
Thank you.
I think this error since your training file format input_train_jp.txt
is wrong.
Invalid input feature sizes
.
I just fix code, please use recent version and please let me know the result.
I think input_train_jp.txt
should be:
彼 O
は O
オバマ大統領 S-PERSON
です O
彼 O
は O
I got the following error.
ValueError: Invalid input feature sizes: "3". Please check at line [1298]
I checked at line 1298 in input_train_jp.txt
and I understood that the "word" has space like:
ほげ[space]ほげ[space]O
"ほげ[space]ほげ" is proper noun.
Thank you for your help to know this error cause.
Is it OK to solve this problem by using --delimiter="\t"
and input_train_jp.txt
format is like ほげ[space]ほげ[tab]O
?
I fix input_train_jp.txt
format and I run the command ($ deep-crf train input_train_jp.txt --delimiter="\t" --dev_file input_dev_jp.txt --save_dir save_jpmodel_dir --save_name bilstm-cnn-crf_adam_jp --optimizer adam --word_emb_file jp_word_emb300.txt --word_emb_vocab_type replace_only --gpu 0
), I got following error:
File "build/bdist.linux-x86_64/egg/deepcrf/__init__.py", line 66, in train
File "build/bdist.linux-x86_64/egg/deepcrf/main.py", line 102, in run
ValueError: Invalid training sizes: 0 sentences.
Any ideas?
Is it OK to solve this problem by using --delimiter="\t" and input_train_jp.txt format is like ほげ[space]ほげ[tab]O ?
Yes! I think it is a good solution.
Each sentence must be split by a blank line (empty line \n) in input_train_jp.txt
.
Note that you should put empty line (\n) between sentences. This format is called CoNLL format.
I mean if you have two sentences,
$ cat input_file.txt
Barack B−PERSON
Hussein I−PERSON
Obama E−PERSON
is O
a O
man O
. O
Yuji B−PERSON
Matsumoto E−PERSON
is O
a O
man O
. O
My input_train_jp.txt
file has blank line ("\n") between sentences (more precisely, between tweets) but I got the error...
Now your input_train_jp.txt
seems following?
あああ[tab]O
あ[tab]O
い[tab]O
う[tab]O
お[space]お[tab]O
お[tab]O
Now your input_train_jp.txt seems following?
あああ[tab]O
あ[tab]O い[tab]O う[tab]O
お[space]お[tab]O お[tab]O
Yes.
OK. Can you send me your input file via e-mail if you are ok.
nanigashi03[at]
gmail.com
Or, please try replace [tab] to [space] :
お[space]お => お_お
[tab] => [space]
and please use --delimiter=" "
.
Maybe [tab] unicode causes this error?
replace [tab] to [space]:
お[space]お => お_お
[tab] => [space] use --delimiter=" "
It worked!!! Thank you very much for your help!!!
OK. It seems our code or input format with [tab] will cause that error.
I see. Thank you very much. I changed the issue title to know the content.
When I trained with English train/dev files, it worked. But when I trained with Japanese train/dev files (and set pre-trained Japanese word embeddings file), I got the following error.
I want to set pre-trained Japanese char embeddings file, but it looks like there is not
--char_emb_file
option. I am wondering if this is the cause of the error. Does it support Japanese train/dev file (or --char_emb_file option) ? Thank you.