Closed PedroArvela closed 6 years ago
In gitlab by @PedroArvela on Feb 20, 2017, 15:22
Extract articles from Portuguese Wikipedia.
Split articles into paragraphs, use the following as separator.
. fim-de-parágrafo .
In gitlab by @PedroArvela on Feb 23, 2017, 16:01
CETEMPúblico also has paragraph separations in their XML format.
Using the 20170201 Snapshot for Wikipedia.
In gitlab by @PedroArvela on Feb 20, 2017, 15:22
Extract articles from Portuguese Wikipedia.
Split articles into paragraphs, use the following as separator.