LimeSoup

LimeSoup is a package to parse HTML or XML papers from different publishers. It can be used to feed a database.

Usage

Full Usage:

from LimeSoup import (
    ACSSoup, 
    AIPSoup,
    APSSoup, 
    ECSSoup, 
    ElsevierSoup, 
    IOPSoup, 
    NatureSoup, 
    RSCSoup, 
    SpringerSoup, 
    WileySoup,
)

with open(article, 'r', encoding = 'utf-8') as f:
    html_str = f.read()

***Choose correct publisher
data = ECSSoup.parse(html_str)

with open('file_test.json', 'w', encoding = 'utf-8') as f:
    json.dump(data, f, sort_keys=True, indent=4, ensure_ascii=False)

Currently, we have implemented the following parsers:

Development documentation

Please refer to the wiki pages.

Change logs

Please see change logs.

Credits

LimeSoup was contributed to by these genius people:

Tiago Botari
Ziqin Rong
Vahe Tshitoyan
Nicolas Mingione
Jason Madeano
Haoyan Huo
Tanjin He
Zach Jensen
Alex van Grootel
Edward Kim
Haihao Liu
Zheren Wang

If you are planning to use LimeSoup in your work, please consider citing the following paper:

Kononova et. al "Text-mined dataset of inorganic materials synthesis recipes", Scientific Data 6 (1), 1-11 (2019) 10.1038/s41597-019-0224-1

CederGroupHub / LimeSoup

readme

LimeSoup

Usage

Development documentation

Change logs

Credits