When reading from WDD, wikipedia, backstabbr, and Webdip, we should ensure that we do so as efficiently as possible.
WDD gives headers saying not to cache pages ('cache-control': 'no-store, no-cache, must-revalidate, post-check=0, pre-check=0', 'pragma': 'no-cache'), but can gzip them.
Backstabbr also says not to cache pages ('cache-control': 'private'), but can also gzip them.
Wediplomacy also says not to cache pages ('cache-control': 'no-store, no-cache, must-revalidate, post-check=0, pre-check=0'), but can gzip them.
Wikipedia is the same ('cache-control': 'private, s-maxage=0, max-age=0, must-revalidate').
urllib doesn't ask for gzipped pages. request or httplib2 will do so, so migrating from urllib to request would help.
When reading from WDD, wikipedia, backstabbr, and Webdip, we should ensure that we do so as efficiently as possible.
WDD gives headers saying not to cache pages ('cache-control': 'no-store, no-cache, must-revalidate, post-check=0, pre-check=0', 'pragma': 'no-cache'), but can gzip them. Backstabbr also says not to cache pages ('cache-control': 'private'), but can also gzip them. Wediplomacy also says not to cache pages ('cache-control': 'no-store, no-cache, must-revalidate, post-check=0, pre-check=0'), but can gzip them. Wikipedia is the same ('cache-control': 'private, s-maxage=0, max-age=0, must-revalidate').
urllib doesn't ask for gzipped pages. request or httplib2 will do so, so migrating from urllib to request would help.