CederGroupHub / LimeSoup

LimeSoup is a package to parse HTML or XML papers from different publishers.
MIT License
19 stars 7 forks source link

Elsevier parser issue with the parse_formula function #28

Closed hhaoyan closed 5 years ago

hhaoyan commented 5 years ago

The parser generates TypeError when parsing papers. Example DOI's:

Exception information:

must be str, not NoneType

========= Remote Traceback (1) =========
Traceback (most recent call last):
  File "/home/hhuo/anaconda3/envs/synthesis/lib/python3.6/site-packages/rpyc/core/protocol.py", line 329, in _dispatch_request
    res = self._HANDLERS[handler](self, *args)
  File "/home/hhuo/anaconda3/envs/synthesis/lib/python3.6/site-packages/rpyc/core/protocol.py", line 590, in _handle_call
    return obj(*args, **dict(kwargs))
  File "/home/hhuo/Projects/Codes/synthesis-api-hub/synthesis_api_hub/worker.py", line 20, in wrapper
    ret = f(self, *args, **kwargs)
  File "/home/hhuo/Projects/Codes/LimeSoup/LimeSoup/api_worker.py", line 38, in parse_elsevier
    return ElsevierSoup.parse(html_string)
  File "/home/hhuo/Projects/Codes/LimeSoup/LimeSoup/lime_soup.py", line 57, in parse
    return self._next.parse(html_str)
  File "/home/hhuo/Projects/Codes/LimeSoup/LimeSoup/lime_soup.py", line 69, in parse
    results = self._next.parse(results)
  File "/home/hhuo/Projects/Codes/LimeSoup/LimeSoup/lime_soup.py", line 67, in parse
    results = self._parse(html_str)
  File "/home/hhuo/Projects/Codes/LimeSoup/LimeSoup/ElsevierSoup.py", line 33, in _parse
    parser.parse_formula(rules=[{'name': 'formula'}])
  File "/home/hhuo/Projects/Codes/LimeSoup/LimeSoup/parser/parser_paper_elsevier.py", line 113, in parse_formula
    label.string = ' ' + label.string + ' '
TypeError: must be str, not NoneType
hhaoyan commented 5 years ago

solved eb14c1b1c08b04339e5a25e5720cdf52e106b45b