run_ernie.sh infer 进行预测，数据格式问题

PaddlePaddle / models

Officially maintained, supported by PaddlePaddle, including CV, NLP, Speech, Rec, TS, big models and so on.

Apache License 2.0

6.91k stars 2.91k forks source link

run_ernie.sh infer 进行预测，数据格式问题 #4894

Open Adrian-Yan16 opened 4 years ago

Adrian-Yan16 commented 4 years ago

训练了模型之后，bash run_ernie.sh infer 进行预测，预测的数据格式不应该是未标注的数据吗？你这里为什么只能用标注过的数据做预测

function run_infer() {
    echo "infering"
    python run_ernie_sequence_labeling.py \
        --mode infer \
        --ernie_config_path "${ERNIE_PRETRAINED_MODEL_PATH}/ernie_config.json" \
        # --init_checkpoint "${ERNIE_FINETUNED_MODEL_PATH}" \
        --init_checkpoint "./ernie_models/step_620"
        --init_bound 0.1 \
        --vocab_path "${ERNIE_PRETRAINED_MODEL_PATH}/vocab.txt" \
        --batch_size 64 \
        --random_seed 0 \
        --num_labels 57 \
        --max_seq_len 128 \
        **--test_data "${DATA_PATH}/test.tsv" \**
        --label_map_config "./conf/label_map.json" \
        --do_lower_case true \
        --use_cuda false

}

用未标注过的语料训练就报错

Xreki commented 4 years ago

请问具体报什么错误啊？

Adrian-Yan16 commented 4 years ago

要预测的数据是标注的数据，而不是生的数据

发送自 Windows 10 版邮件https://go.microsoft.com/fwlink/?LinkId=550986应用

发件人: Yiqun Liumailto:notifications@github.com 发送时间: 2020年10月9日 14:39 收件人: PaddlePaddle/modelsmailto:models@noreply.github.com 抄送: Adrian-Yan16mailto:Adrian-Yan2329@outlook.com; Authormailto:author@noreply.github.com 主题: Re: [PaddlePaddle/models] run_ernie.sh infer 进行预测，数据格式问题 (#4894)

请问具体报什么错误啊？

― You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/PaddlePaddle/models/issues/4894#issuecomment-706000708, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AM52KNXNW4UOHGACL3QJTSLSJ2V3XANCNFSM4R6R5FMA.

dancinghui commented 3 years ago

我也遇到了同样的问题，将test_data文件换为infer.tsv,会报如下错误： Traceback (most recent call last): File "run_ernie_sequence_labeling.py", line 316, in <module> do_infer(args) File "run_ernie_sequence_labeling.py", line 264, in do_infer mode='test') File "/home/lihui/github_clone/models/PaddleNLP/lexical_analysis/creator.py", line 153, in create_pyreader phase=mode), File "../shared_modules/preprocess/ernie/task_reader.py", line 222, in data_generator examples = self._read_tsv(input_file) File "../shared_modules/preprocess/ernie/task_reader.py", line 85, in _read_tsv Example = namedtuple('Example', headers) File "/home/lihui/anaconda3/envs/paddle/lib/python3.6/collections/__init__.py", line 401, in namedtuple 'identifiers: %r' % name) ValueError: Type names and field names must be valid identifiers: "['百余名诺贝尔奖得主联合签名支持转基因作物,中国两院士签名,,,宁夏在线']"

aliendaniel commented 3 years ago

1.8版本models ./models-release-1.8/PaddleNLP/shared_modules/preprocess/ernie/task_reader.py 这个文件第222行，未区分mode状态，导致度数据的时候都是按照train的格式来读数据的，导致无法读取infer的数据，这个bug麻烦改下？