deeppavlov / DeepPavlov

An open source library for deep learning end-to-end dialog systems and chatbots.
https://deeppavlov.ai
Apache License 2.0
6.72k stars 1.15k forks source link

How to train my own dstc format file? can't working properly for seq2seq model #396

Closed MuruganR96 closed 6 years ago

MuruganR96 commented 6 years ago

I already tried my own dstc format data.but it is not working properly. i was edit kvret folder files like kvret_dev_public.json, kvret_entities.json, kvret_test_public.json , kvret_train_public.json and then kvret_kb.json. after trained my model, it is not working propely. confusion for how to call my kvret_kb.json api function to my own kvret_dev_public.json .

then why we need frequently this dialog_id, kb_items, kb_columns.

MuruganR96 commented 6 years ago

sir how was you intergrated in your kvrtc json files data into knowledge base kvrtc_kb.json. you can't use gobot DSTC api call text type in this place. and your seq2seq chat bot model basically not satisfying continue conversation also. how to implement my own data simply like .csv, .yml, and .txt kind of files to put the input and achieve it. or how to create my own DSTC format files sir. thank you for advance.

MuruganR96 commented 6 years ago

sir why you can't reply me. now i am very confused. and i am very beginner for this topic. please help me sir.thank you for advance.

vikmary commented 6 years ago

Please specify your problem more precisely.

There are for now two goal-oriented bot models in deeppavlov: go_bot and seq2seq_go_bot. These are two different neural architectures. go_bot trains on DSTC2 dataset and seq2seq_go_bot on kvret dataset (by Stanford). They need different labelling of input data. What are you trying to train and what data are you using? Maybe documentation will be also useful for you: http://docs.deeppavlov.ai/en/master/skills/go_bot.html http://docs.deeppavlov.ai/en/master/skills/seq2seq_go_bot.html

MuruganR96 commented 6 years ago

I trained my own 50 dialogue(hospital Management kind of data) for kvret_train_public.json. and randomly 8 dialogue for kvret_test_public.json & kvret_dev_public.json. Epoch =100, batch=1, and patient= 100. that same learning rate =0.009. but at the end of the training validatation and testing accuracy is very low. (i.e per_item_bleu": 0.0587, per_item_accuracy": 0.0761). but it is working fine. how i am increasing my validation accuracy very perfect level.

and i tried to tunned our hyper parametre for epoch = 30 ,45 ,50, 75(per_item_accuracy: 0.0723). slightly varies that validation accurary. but it's also very low. i am begginer for how to tune a hyper parametre and get more accuracy? pls give me some hint sir.

and i change our learning rate == 0.009 --> 0.005. it's entirely confused seq2seq model. why , what is happened sir? pls help me sir. i am a begginer for DL, ML. { "dialogue": [ { "turn": "driver", "data": { "end_dialogue": false, "utterance": "What should I do my health is improper?" } }, { "turn": "assistant", "data": { "end_dialogue": false, "requested": { "room": true, "agenda": true, "time": false, "date": false, "party": false, "event": true }, "slots": { "event": "patient card", "agenta": "my health is improper", "room": "health care" }, "utterance": "you may call our health Care between 7:00 a.m. and 9:00 p.m. from your registered mobile number" } }, { "turn": "driver", "data": { "end_dialogue": false, "utterance": "our health Care between 7:00 a.m. and 9:00 p.m. from your registered mobile number" } }, { "turn": "assistant", "data": { "end_dialogue": false, "requested": { "room": false, "agenda": false, "time": false, "date": false, "party": false, "event": false }, "slots": {

            },
            "utterance": "Any Query on your health suggestion?"
        }
    }
],
"scenario": {
    "kb": {
        "items": [
            {
                "agenta": "my health is improper, i am suffering,  not works properly my health",
                "event": "patient card",
                "room": " health Care"
            }
        ],
        "column_names": [
            "agenta",
            "event",
            "room"
        ],
        "kb_title": "patient card"
    },
    "task": {
        "intent": "patient card"
    },
    "uuid": "41" 
} 

} this way to feed my data. thank you so much sir. and other once again thanks for advance.

vikmary commented 6 years ago

Do you use templates to generate bot responses in the train set? In other words, is there a predefined set of possible responses for your bot?

MuruganR96 commented 6 years ago

Mam, can you share your common templates to me. i will create correct way for my training data. and why that accuracy level as shown very low mam. how to increase my accuracy percentage. thank you for advance mam.

MuruganR96 commented 6 years ago

https://github.com/deepmipt/DeepPavlov/issues/396#issuecomment-416846623 I am not using any templates mam. just edit that deeppavlov utterance and replace my utterance, agenta, event, room. using kb_columns for my own agenta, room. that's it mam.

Mam how to increase that accuracy percentage. i am using epoch = 100, batch =5 got accuracy = 0.1299. it is very low. but working fine. how it is happened. if i increase my accuracy my bot works very well mam. thank you for advance.

vikmary commented 6 years ago

Your amount of data is very small. But seq2seq_go_bot needs loads of it. go_bot is better for your case. You should provide your own data in DSTC2-format. Try:

python3 -m deeppavlov download configs/go_bot/gobot_dstc2.json

The format of dstc2 dialogs you'll see in deeppavlov/../download/dstc2/dstc2-trn.json then. The format of dstc2 templates you'll see in deeppavlov/../download/dstc2/dstc2-ttemplates.json.

MuruganR96 commented 6 years ago

blue_score

https://github.com/deepmipt/DeepPavlov/issues/408#issuecomment-416848625

just before i trained that new seq2seqv2 gobot. it throws exception division by zero for that google_blew score value. deeppavlov/metrics/google_bleu.py if ratio > 1.0: bp = 1. else: bp = math.exp(1 - 1. / ratio) Defaultly it takes 0.0 as blew score. mam how to fix this issue.

vikmary commented 6 years ago

Seems like model predictions are too bad, thus BLEU score is equal to zero. As your data is too small (at least < 1000 examples), you shouldn't use seq2seq_go_bot.