RowitZou / topic-dialog-summ

AAAI-2021 paper: Topic-Oriented Spoken Dialogue Summarization for Customer Service with Saliency-Aware Topic Modeling.
MIT License
77 stars 9 forks source link

Step 3 and 4 of Instructions #9

Closed ujjawalmadan closed 3 years ago

ujjawalmadan commented 3 years ago

Hi! Just hoping to get some help on this issue. I ran the code as instructed but was met with this error:

image

It seems that my custom data was recognized and I put it in the right format I believe. But it was not processed correctly in step 3 of your instructions as it is showing no instances were processed.

In step 4, I am shown this error.

image

All I have is this in the file directory.

image

Can you help? Thanks.

RowitZou commented 3 years ago

Hi. It shows no instances were processed because the data format might be incorrect.

See the file src/prepro/data_builder.py, line 310. When len(dialogue_b_data) == 0, the instance will not be processed. It means the dialogue utterances are all empty and b_data (line 265) is None.

There are two factors that lead to a None b_data when processing an utterance:

  1. The role information is not correct (line 71).
  2. The length of utterance is too short (line 73).

The original data that we used are in the Chinese language and the role info is denoted as Chinese characters ("客服" means agent and "客户" means customer). If your custom data is in English and the role info is customized, the role info condition in line 71 should also be modified.

To avoid potential bugs, all role info in data_builder.py should be replaced with your custom role info. For example, replacing "客服" with 'agent' and replacing "客户" with 'customer'.

RowitZou commented 3 years ago

For the FIleNotFound Error, just rename the training file as "aws8000_alibaba.train.pt".

Hope this can help you.

ujjawalmadan commented 3 years ago

This helped! Thank you so much!