EdCo95 / scientific-paper-summarisation

Machine learning models to automatically summarise scientific papers
260 stars 64 forks source link

Papers_With_Section_Titles 404 #3

Open Lantow opened 6 years ago

Lantow commented 6 years ago

Ok, so I'm trying to get the code to run - Looks like a promising library. After a few hours of fixing encoding errors and the like, and also after a correspondence with the Elsevier support team (who had some issues with their API service), I have finally managed to download the data; However, after having followed the instructions of first running the acquire_data.py I get the following error when running cspubsumext_creator.py: "path not found: .../Data/Papers/Full/Papers_With_Section_Titles/'"

And surely enough the only thing in the directory is: Parsed_Papers/ Utility_Data/ XML_Papers/

Any idéa where things have gone wrong?
Any help would be much appreciated!

philgooch commented 6 years ago

I have the same issue. I think that with the Elsevier API you only get titles and abstract, but not the full text, unless you have a ScienceDirect subscription.

This means that Papers/Full/Papers_With_Section_Titles/ never gets created as there are no full text papers with section titles available.

I'm looking to see if there is a way to pull XML full-text papers from a different source, or put fulltext papers in there manually

satishpasumarthi commented 6 years ago

I tried downloading the data from my university network and I could see that the complete files got downloaded but the directory structure Papers/Full/Papers_With_Section_Titles never got created. There is no piece of code which is actually creating these. Should we manually copy the Parsed_papers to that directory?

MiaoyanGu commented 3 years ago

I have the same question, there is no highlights in the downloaded xml documents, even though i have the full papers. Can you please show me another way to get xml files with highlights? Thanks!

Alexwangziyu commented 7 months ago

I have the same question, there is no highlights in the downloaded xml documents, even though i have the full papers. Can you please show me another way to get xml files with highlights? Thanks!

same have you solved it yet?