CederGroupHub / LimeSoup

LimeSoup is a package to parse HTML or XML papers from different publishers.
MIT License
19 stars 7 forks source link

Issues for ElsevierSoup #45

Closed DICPZhou closed 4 years ago

DICPZhou commented 4 years ago

1.zip This is an .xml document I got through data mining in the elsvier journal, when I type in pycharm: from LimeSoup import ElsevierSoup with open('1.xml', 'r', encoding = 'utf-8') as f: xml_str = f.read() data = ElsevierSoup.parse(xml_str) print(data) The print results was: {'Journal': None, 'DOI': None, 'Title': None, 'Keywords': [], 'Sections': []} I'm curious what I did wrong and whynot get the expected results. Thank you!

OlgaGKononova commented 4 years ago

@hhaoyan can you please help to address this issue.

hhaoyan commented 4 years ago

Hello Zhou,

I typed the exact same code and here is the result I got: https://jsonformatter.org/6798f2 It seems the paper was correctly parsed. I'm also curious about what's causing your issue. Can you print the value of xml_str?

DICPZhou commented 4 years ago

Thank you very much for your timely reply. I can print the value of xml_str. When I saved this articel in HTML format and again type" data = ElsevierSoup.parse(html_str)", I can also get the results. It seems that this error is only for xml format in my computer. The modules also meet the requirements in setup.py.

hhaoyan commented 4 years ago

We don't have any information regarding your environment... AFAIK, your provided XML file works perfectly fine from my side. Can you upload relevant files as a zip?

DICPZhou commented 4 years ago

Because it works from your side, so I redownloaded the file and installed it again. Fortunately, it works this time. Thanks again for your patient answer.