Yale-LILY / SummerTime

An open-source text summarization toolkit for non-experts. EMNLP'2021 Demo
https://arxiv.org/abs/2108.12738
Apache License 2.0
264 stars 30 forks source link

Add XLSum and Massivesumm datasets #114

Closed haileyschoelkopf closed 2 years ago

haileyschoelkopf commented 2 years ago

Add the XLSum and Massivesumm datasets to SummerTime.

still TODO:

haileyschoelkopf commented 2 years ago

still TODO:

TODO in a later PR, after finals if possible?

haileyschoelkopf commented 2 years ago

@niansong1996 This should be all set and passing tests now, found the issue with the Xlsum test case.

Your call on whether to include this in 1.2.0 or not! See line 770 in dataset_loaders.py for the languages that need additional work to be ready for MassiveSumm. Other than these specified languages, though, both datasets are fully working as expected!

There are definitely some things here that I want to refactor to be a bit more clean, but if it's alright I'd like to wait until after finals finish to make this neater since I won't really have much time before then.

haileyschoelkopf commented 2 years ago

Finally worked out a fix to a FileNotFoundError that wasn't happening when I tested locally, now actually ready :)

haileyschoelkopf commented 2 years ago

Thanks, will look at these later this weekend if I have time!

niansong1996 commented 2 years ago

@NickSchoelkopf Can you check where are we on this? Seems like it should be able to merge after passing the CI tests?

haileyschoelkopf commented 2 years ago

Yes, was planning on getting this done by Friday! Should be a minor issue to fix.

haileyschoelkopf commented 2 years ago

@niansong1996

This should be ready to merge, but CI tests fail at the model tests (which shouldn't be affected by this PR) with error PermissionError: [Errno 13] Permission denied: '/export' (full traceback in the most recent failed CI test).

Has anything changed on the machine running the CI tests to cause the directories to no longer be accessible?

niansong1996 commented 2 years ago

Hmm, might be the case that the home directory of Drago has been moved to an external drive? Let me ask him.

haileyschoelkopf commented 2 years ago

Ready to merge! @niansong1996

niansong1996 commented 2 years ago

Awesome! Glad this is done!