ampproject / amp-toolbox

A collection of AMP tools making it easier to publish and host AMP pages.
Apache License 2.0
449 stars 243 forks source link

[linter] Issues when scripting linter for batch processing #775

Open shishirm opened 4 years ago

shishirm commented 4 years ago

Attempting to run linter as a batch process, I am running into a couple of errors,

  1. Consistently running into the following error

URL: https://sports.ndtv.com/webstories/sports/teams-players-that-created-history-in-premier-league-93 Status: FAIL Message: [publisher-logo-src] (https://c.ndtvimg.com/gws/93/assets/14.png) error: {"errno":-110,"code":"ETIMEDOUT","syscall":"connect","address":"184.26.81.182","port":443} Status: FAIL Message: [poster-portrait-src] (https://c.ndtvimg.com/gws/93/assets/16.jpeg) error: {"errno":-110,"code":"ETIMEDOUT","syscall":"connect","address":"184.26.81.182","port":443}

If I were to run this individually, I am able to successfully retrieve both URLs. This happens after about 10-12 URLs are successfully processed so I wonder if there is some socket issue?

  1. Once again, after a few URLs are processed, one of the calls fails with,

couldn't load [https://khabar.ndtv.com/webstories/entertainment/ticket-to-bollywood-tv-actor-to-bollywood-stars-shah-rukh-khan-irrfan-hina-138] [debug: curl -sS -i -H 'user-agent: Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)' 'https://khabar.ndtv.com/webstories/entertainment/ticket-to-bollywood-tv-actor-to-bollywood-stars-shah-rukh-khan-irrfan-hina-138']

The curl run manually seems to work just fine as does running the URL by itself through the linter. This could potentially be some socket issue as well.

Any suggestions / recommendations would greatly help!

sebastianbenz commented 4 years ago

How do you run linter in batch? You might have to rate limit concurrent checks. Node is not very good at handling too many requests at the same time.

shishirm commented 4 years ago

So currently running it via a python script by looping over URLs one by one, so nothing too fancy. I did add a rudimentary rate limiter i.e. a few seconds of sleep between each invocation but that did not seem to help. Would you have any other suggestions?