Closed vuraemon closed 3 years ago
Excuse me!
Can you tell me why you used the line "if char in [',', '。', '?', '!', ':', ';', '(', ')', '、'] and len(sentence) > 64:" for all train/dev/test set to split long sentences? Whether this process is valid for evaluating Chinese Word Segmentation?
Thanks for your answering.
Thanks for asking.
The reason we split long sentences into short ones is to make to code run faster. You don't have to do this.
Given that punctuations are always natural word boundaries, we think this is valid for evaluating Chinese Word Segmentation.
Excuse me!
Can you tell me why you used the line "if char in [',', '。', '?', '!', ':', ';', '(', ')', '、'] and len(sentence) > 64:" for all train/dev/test set to split long sentences? Whether this process is valid for evaluating Chinese Word Segmentation?
Thanks for your answering.