Hi. First of all, thank you for sharing the preprocessing codes.
I read your reply on one of the other issues about no_doc_info for the CADEC dataset. I found that it only affects the produced inline.txt file but it eventually gets erased on the split_train_test.py. It's an easy edit to include it back though. Have you also built-in a way to include the document name for the ShARe datasets? Upon checking there's no argument for it so thought I'd ask first before I create a workaround for it. Thanks.
Hi. First of all, thank you for sharing the preprocessing codes.
I read your reply on one of the other issues about no_doc_info for the CADEC dataset. I found that it only affects the produced inline.txt file but it eventually gets erased on the split_train_test.py. It's an easy edit to include it back though. Have you also built-in a way to include the document name for the ShARe datasets? Upon checking there's no argument for it so thought I'd ask first before I create a workaround for it. Thanks.