We list corpus in the Corpus, which can be downloaded. The rest of corpus are grabbed from the Internet via a scrapy. Because of the copyright, if they are released, we could face some legal risks. I'm very sorry for that. But with a simple scrapy, it is easy to get a mount of data in several days.
We list corpus in the Corpus, which can be downloaded. The rest of corpus are grabbed from the Internet via a scrapy. Because of the copyright, if they are released, we could face some legal risks. I'm very sorry for that. But with a simple scrapy, it is easy to get a mount of data in several days.