Closed PhilipMay closed 3 years ago
Hi @PhilipMay Does it have to be paraphrases? Or can it be any suitable training data for learning embedding models?
In the second case, I only know GermanDPR. But maybe there are some German summarization datasets. I also plan to crawl some (headline, news summary pairs) from Spiegel and Zeit.de - But sadly due to copyright issue these datasets cannot be shared (only the script to get these datasets can be shared). But this type of data would also be quite valuable to train embedding models
Hey @nreimers , thanks! We plan to do text augmentation with them. So paraphrase would be best. Thanks Philip
tagging @sitongye
closing this again
Hey @nreimers I am looking for German paraphrase datasets. Is there more than this PAWS-X dataset? Many thanks Philip