NorskRegnesentral / skweak

skweak: A software toolkit for weak supervision applied to NLP tasks
MIT License
918 stars 73 forks source link

MUC-6 dataset #6

Closed agosiewska closed 3 years ago

agosiewska commented 3 years ago

Hello, I really appreciate your work on weak supervision. I have noticed that in your preprint, you show the results of skweak on the MUC-6 corpus. https://arxiv.org/abs/2104.09683

I am testing different generative models and I would like to compare them on the same data sets, however, I cannot find MUC-6. Could you, please provide me a source from which you downloaded the data set? Is it somewhere behind a paywall?

plison commented 3 years ago

Hi! The MUC-6 data file is too big to be put directly in the repository (due to Github's file size constraints), so we put it in the "assets" of the release: https://github.com/NorskRegnesentral/skweak/releases/download/0.2.8/muc6.spacy This file is in the Spacy DocBin format. Note the documents are for some reason duplicated, but it doesn't change anything to the results of course.

agosiewska commented 3 years ago

Thank you very much!