Open pengbo-learn opened 4 years ago
An error raised when I ran
python align_data_json.py > data.jsonl
.Adding annotator tokenize Adding annotator ssplit Adding annotator pos Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.7 sec]. Adding annotator lemma INFO:CoreNLP_JavaServer: CoreNLP pipeline initialized. INFO:CoreNLP_JavaServer: Waiting for commands on stdin iconv_open is not supported Traceback (most recent call last): File "align_data_json.py", line 669, in <module> main(args) File "align_data_json.py", line 661, in main data = make_data(dir_name) File "align_data_json.py", line 483, in make_data song_text = open_text(lyrics_file) File "align_data_json.py", line 439, in open_text for morph in parse(line.encode('utf-8')): File "align_data_json.py", line 65, in parse phrase_lyrics = get_phrase(phrase, phrase_info) File "align_data_json.py", line 80, in get_phrase accent = get_accent("".join([w.split("\t")[0] for w in phrase])) File "align_data_json.py", line 201, in get_accent acc_position = int(morph.split("\t")[7].split(",")[0]) ValueError: invalid literal for int() with base 10: '\xe3\x82\xb3\xe3\x83\xac'
I write
morph.split("\t")
to file, which is['\xe3\x81\x93\xe3\x82\x8c', '\xe3\x82\xb3\xe3\x83\xac', '\xe3\x82\xb3\xe3\x83\xac', '\xe6\xad\xa4\xe3\x82\x8c', '\xe4\xbb\xa3\xe5\x90\x8d\xe8\xa9\x9e', '', '', '\xe3\x82\xb3\xe3\x83\xac', '0', '']
. I have no idea what to do next to fix this error.
If I use unidic's origin dicrc, this error disappeared, i.e., undo this command mv dic/dicrc dic/unidic/
in readme.
An error raised when I ran
python align_data_json.py > data.jsonl
.I write
morph.split("\t")
to file, which is['\xe3\x81\x93\xe3\x82\x8c', '\xe3\x82\xb3\xe3\x83\xac', '\xe3\x82\xb3\xe3\x83\xac', '\xe6\xad\xa4\xe3\x82\x8c', '\xe4\xbb\xa3\xe5\x90\x8d\xe8\xa9\x9e', '', '', '\xe3\x82\xb3\xe3\x83\xac', '0', '']
. I have no idea what to do next to fix this error.