QZx7 / MindTheTime

MIT License
4 stars 0 forks source link

How to run the code? #6

Open CQUPT-CaiKe opened 3 months ago

CQUPT-CaiKe commented 3 months ago

Dear authors, I was very interested after reading your work. But I didn't find any description about the startup code. Please forgive me for being a beginner, can you tell me the detailed code startup method, I would be grateful.

QZx7 commented 3 months ago

@CQUPT-CaiKe Hi, this repo contains the code for our data collection system and the dataset. For training a model, we used ParlAI framework: https://parl.ai/ . If you are familiar with the ParlAI (you might need to spend some time learning the framework if you are not familiar with it), it should be just converting the dataset to the required format and start the training with one simple command. We do not serve the model due to cost issues, however, we provide the train/test split of the dataset.

** Chatting service So, if you want to run the data collection server, what you need to do are:

And you should be able to see the instructions and the start matching button. Hit the start matching button and the work with Id=000 will join the match queue. Do the same thing on a new browser tab and go to localhost:8888?workerId=001 for another instance. After you have more than two workers in the queue, a couple of them will be directed into the chatting room.

Events, progress are loaded from the json files under mtt/chat/data/events. You can customize your own events with the same format.

** Model training With the data files under mtt/chat/data/gap_chat, you can convert the dialogue data into ParlAI format. You can refer to mtt/data_collection/prepare_data.py for how we treat the events and dialogues. Then you can simply run the training command within ParlAI by loading data from local file. It is possible to play with whatever model that is available in the model zoo. However, if you want to train the RAG model, you might need to create the index first. We follow the steps as described in: https://parl.ai/docs/agent_refs/rag.html and use DPR model. However, the DPR repo has been archived by Meta it might be possible to use some other methods. You may also fine-tune the model on existing MSC models https://parl.ai/docs/zoo.html#msc-models without indexing. However, the performance may not be as good as creating some new index.