MaterialEyes / exsclaim

A toolkit for the automatic construction of self-labeled materials imaging datasets from scientific literature
GNU General Public License v3.0
30 stars 8 forks source link

Journal scraper does not for ACS #29

Open WeixinGithubJiang opened 2 years ago

WeixinGithubJiang commented 2 years ago

Describe the bug The problem here is that, python requests would scrape the initial html file from the ACS website, however, ACS introduces Javascript in their html, which makes the source code of the one we scrape from the website looks different the one we see in the Chrome or other web browser. A possible solution for this is to use different scraping function that allows us to construct a simulator (i.e. function as a web browser) and simulate the process of executing the embedding Javascript.

To Reproduce

Expected behavior

Outputs

Environment (please complete the following information):

Additional context