Open kris927b opened 4 weeks ago
Regarding the EURLex overlap: This will probably be handled automatically during deduplication anyway, I suppose?
Yeah. Using deduplication there should be no problem in this. I guess only reason to remove them prior would be to minimise preprocessing time?
Cellar is a repo of publications from the European Union managed by the European Publications Office (Cellar).
Note: The EURLex data is contained in the Cellar, so should be filtered out or removed as a separate dataset.