SeedVault / rhizome

conversational bot engine created by Botanic/SEED team
https://seedtoken.io
Other
7 stars 2 forks source link

#1 - Parse ChatScript top files and generate list of intents in json #1

Closed SEEDToken closed 6 years ago

SEEDToken commented 6 years ago

Take sequoia FAQ chatscript top file and use it to train the model. not sure if you are familiar with sequoia bot or chatscript but it's easy to understand anyway

this is the chatscript file: https://github.com/botanicinc/sequoia/blob/master/RAWDATA/SEQUOIA/FAQs.top

you can see there each training phrase of each intent defined with the prefix "#!" also each intent has an ID you can find it after "u: " and with the prefix "ID"

so for instance:

MISSING USERNODE LABEL

! What is the CUI?

! What's a conversational interface?

! Tell me about CUI

! Definition of CUI

! CUI definition

! Tell me about CUIs

! Tell me about conversational interfaces

! More about conversational user interface # ACTION - these needs to be level two on this topic/category once we have that ready

u: ID8F1E1507 ( ![economy real_world] [ (<< ~cui definition >>) ([ more_about tell_me_about what whatis what's what's ] * ~cui) ]) $category1 = cui ^reuse(ID4D8A)

line 2 to 9 will be training phrases for the intent id ID8F1E1507 (note we must strip that "# ACTION - ...." you can define a rule like "ignore anything after two spaces" .. I guess that will work fine

you can see there are some intents that has nested blocks with more intents inside. For now just ignore them and parse the first level of intents. We will take care of those nested blocks later

so for this we need a python script that parses the chatscript file and generates a json file with all defined intents like this: [{'ID8F1E1507': ['What is the CUI?', 'What's a conversational interface?', 'etc..']}, {'ID8F1E1507': ['What is the CUI?', 'What's a conversational interface?', 'etc..']}, {'ID8F1E1507': ['What is the CUI?', 'What's a conversational interface?', 'etc..']}]

SEEDToken commented 6 years ago

➤ Dan Brumleve commented:

Test

SEEDToken commented 6 years ago

➤ Dan Brumleve commented:

I wrote two tools, top2rasa.py and top2bbot.py for the rasa training format and the one described here respectively. They are both checked in here:

https://github.com/SeedVault/badlands/tree/master/rasa