Closed Jiakui closed 4 years ago
Hi Jiakui,
I'm not that familiar with keyword extraction, is it similar to named entity recognition? If so, you should be able to use our model for that. I'm still working on cleaning it up so that it's easily usable - hopefully in the next month or so.
For a sequence tagging task like segmentation I don't think there's really an advantage to enumerating all possible text spans. I think you're better off using an LSTM-CRF or something like that.
Let me know if you've got more questions!
Dave
In fact we have already applied SciIE (an earlier version of DyGIE) in scientific keyword extraction task (https://arxiv.org/abs/1808.09602) and observed improvement over LSTM+CRF. For Chinese word segmentation, I actually think Chinese might be a better language to apply DyGIE since there is no clear word boundary in Chinese (inputs are in pure character level). DyGIE might be able to better solve the problem since it is enumerating all possible text spans. It will work better for segments with overlaps at least than traditional LSTM+CRF for sure.
@Jiakui just letting you know that the code runs now. If you pull it and follow the instructions in the README, you should be able to train a model. To adapt for Chinese word segmentation, you'll probably want to adapt the NER module https://github.com/dwadden/dygiepp/blob/master/dygie/models/ner.py.
Hi ,
I thinks span representation is a great idea. Do you think the span representation is suitable for keywords extraction and chinese word sementation?
Thanks!