Closed pbulsink closed 9 years ago
You are awesome :)
How come you removed the Nature one, btw?
Nature seems to crash the scraper when you run it without access. I'm not sure why. Different things happen, sometimes it just hangs and stops, sometimes it throws an error. Running with loglevel verbose doesn't explain it either.
Not sure if it's a scraper.json issue or a quickscrape issue, so I removed it prior to the pull request to keep the test passing happy, but document what I've tried.
Taylor and Francis fails tests sometimes when the site gets mad about not being able to set cookies. Instead of rendering the article page, it throws an error page:
...
<h1>An Error Occurred Setting Your User Cookie</h1>
<p>This site uses cookies to improve performance. If your browser does not accept cookies, you cannot view this site.</p>
...
OK, looks like T&F will have to be headless. Upcoming version of scraperJSON will allow setting headless on/off.
Could you make a separate PR with the nature scraper so I can debug it cleanly?
Nature is in Pull Request 24 --> https://github.com/ContentMine/journal-scrapers/pull/24
Not all of these work, they require quickscrape to follow relative links (expected in v1.0).