Closed limogin closed 6 years ago
Hi @limogin
I've seen issues (#82) with this script hanging, but never with memory consumption issues. We have an open bug already, we need to more robustly implement the resolving. You may try to run the script in the foreground (instead of the background, from cron) and see whether the script hangs or produces relevant error output. You'll see lots of output (mostly about SSL certificates from sites) but maybe you'll see some interesting output. You can paste the last hundred lines here.
Use the following commands:
cd /var/www/dmi-tcat/helpers
pyhon /var/www/dmi-tect/helpers/urlexpand.py
I paste here some oputput:
/usr/local/lib/python2.7/dist-packages/urllib3/connectionpool.py:858: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
/usr/local/lib/python2.7/dist-packages/urllib3/connectionpool.py:858: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
/usr/local/lib/python2.7/dist-packages/urllib3/connectionpool.py:858: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
/usr/local/lib/python2.7/dist-packages/urllib3/connectionpool.py:858: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/gevent-1.2.2-py2.7-linux-x86_64.egg/gevent/greenlet.py", line 536, in run
result = self._run(*self.args, **self.kwargs)
File "helpers/urlexpand.py", line 123, in job
resp = requests.get(url, headers=request_headers, timeout=socket_timeout, verify=False)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 640, in send
history = [resp for resp in gen] if allow_redirects else []
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 218, in resolve_redirects
**adapter_kwargs
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 658, in send
r.content
File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 823, in content
self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()
File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 745, in generate
for chunk in self.raw.stream(chunk_size, decode_content=True):
File "/usr/local/lib/python2.7/dist-packages/urllib3/response.py", line 432, in stream
for line in self.read_chunked(amt, decode_content=decode_content):
File "/usr/local/lib/python2.7/dist-packages/urllib3/response.py", line 598, in read_chunked
self._update_chunk_length()
File "/usr/local/lib/python2.7/dist-packages/urllib3/response.py", line 540, in _update_chunk_length
line = self._fp.fp.readline()
AttributeError: 'NoneType' object has no attribute 'readline'
Tue Jan 23 10:56:53 2018 <Greenlet at 0x7fdb6c6ff190: job('http://ht.ly/Fj4N30hTzVL', 'test_urls')> failed with AttributeError
/usr/local/lib/python2.7/dist-packages/urllib3/connectionpool.py:858: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
/usr/local/lib/python2.7/dist-packages/urllib3/connectionpool.py:858: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
/usr/local/lib/python2.7/dist-packages/urllib3/connectionpool.py:858: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
/usr/local/lib/python2.7/dist-packages/urllib3/connectionpool.py:858: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
I've now seen this behavior too on one of our own servers. The script was still updating URLs but consuming a lot of memory. Looks like a typical memory leak.
If I can help you in any way, you point me out.
I will try to limit the ram amount and the priority until we can fix this issue:
0 * * * * su -l mywebuser -c '(cd /var/www/myapppath/; ulimit -m 1000000 && nice -n 19 python helpers/urlexpand.py)'
See issue #82 for suggested fix (replacement by PHP script)
Ping @limogin
I see an excesive consumption of memory of urlexpand script up to 99% of available memory. My server has 32G of RAM and I have only a search bin "test" query set in this moment.
python helpers/urlexpand.py
I understand I shouldn't consume this process so much. The installed version is the latest version currently available.