MorenoLaQuatra / domain-specific-academic-dataset

This repository contains a collection of URLs that can be used to mine domain-specific academic datasets
3 stars 0 forks source link

Reagrding datasets #1

Open alkakhurana opened 2 years ago

alkakhurana commented 2 years ago

Hi, I found your paper titled "Extracting highlights of scientific articles: A supervised summarization approach". I am addressing the problem of text summarization. For evaluation purpose, I need the data-set used in your work. Is there any way out to get the three datasets (BioPubSum, CSPubSumm and AIPubSum)?

Thanks

MorenoLaQuatra commented 2 years ago

Hi @alkakhurana,

Thank you for your interest in the topic. To download the data you should follow the instructions provided in this repository. It will allow you to download CSPubSumm.

To download the additional dataset provided by us (BioPubSum & AIPubSum) you just need to use the files containing the URLs provided in our repo: link. They follow the exact same format as the original ones.

alkakhurana commented 2 years ago

Hi @MorenoLaQuatra, Scientific articles in CSPubSum data-set are not open access and are not accessible through the API key method described in https://github.com/EdCo95/scientific-paper-summarisation/tree/master/DataDownloader

Can you provide the text/xml of the scientific articles in the three data-sets?

Thanks

MorenoLaQuatra commented 2 years ago

Unfortunately, I don't have the right to share the data collection, Elsevier is very strict with that. This is why no one share the formatted version of the collection.

You should be able to access the required data from an institution that has an agreement with Elsevier.