canonical / jujucharms.com

The jujucharms.com website.
8 stars 1 forks source link

Juju times out fetching charm metadata #551

Open ChrisMacNaughton opened 6 years ago

ChrisMacNaughton commented 6 years ago

Conjure up regularly fails with a timeout fetching charm metadata during a deploy. This makes it impossible to use conjure-up to deploy any solution as the charms are unavailable.

conjure-up timeout

hatched commented 6 years ago

There are some speed improvements that need to be done to the charmstore but in the meantime you can start by not requesting so much information in that request.

To deploy a charm you shouldn't need a number of those pieces of data. Especially things like stats and bundles-containing both of which are slow requests and not required to deploy. You can also try increasing the timeout from 5s to 10s.

ryan-beisner commented 6 years ago

I still get "0 items" pretty frequently at https://jujucharms.com/u/openstack-charmers-next - could this be related?

ryan-beisner commented 6 years ago

screenshot from 2018-09-17 10-53-21

ajkavanagh commented 6 years ago

Adjusting to 30 seconds for this query also is timing out:

theblues.errors.ServerError: Request timed out: https://api.jujucharms.com/charmstore/v5/~openstack-charmers-next/bionic/neutron-gateway/meta/any?include=bundle-machine-count&include=bundle-metadata&include=bundle-unit-count&include=bundles-containing&include=charm-actions&include=charm-config&include=charm-metadata&include=common-info&include=extra-info&include=revision-info&include=stats&include=supported-series&include=manifest&include=tags&include=promulgated&include=perm&include=id timeout: 30.0

It's breaking tooling all over. What is the expected time that this query should take, please?

ChrisMacNaughton commented 6 years ago

More timeouts

Traceback (most recent call last):
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/theblues/charmstore.py", line 57, in _get
    timeout=self.timeout)
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/requests/api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/raven/breadcrumbs.py", line 326, in send
    resp = real_send(self, request, *args, **kwargs)
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/requests/sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/requests/adapters.py", line 521, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='api.jujucharms.com', port=443): Read timed out. (read timeout=5)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/conjureup/controllers/juju/deploy/common.py", line 30, in do_deploy
    await app.juju.client.deploy(fn)
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/juju/model.py", line 1281, in deploy
    await handler.execute_plan()
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/juju/model.py", line 1925, in execute_plan
    result = await method(*step.args)
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/juju/model.py", line 2067, in deploy
    charm)
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/juju/model.py", line 1338, in _add_store_resources
    entity = await self.charmstore.entity(entity_url)
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/juju/model.py", line 2161, in coro
    return await self.loop.run_in_executor(None, method)
  File "/snap/conjure-up/1021/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/theblues/charmstore.py", line 136, in entity
    return self._meta(entity_id, includes, channel=channel)
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/theblues/charmstore.py", line 107, in _meta
    data = self._get(url)
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/theblues/charmstore.py", line 78, in _get
    raise ServerError(message)
theblues.errors.ServerError: Request timed out: https://api.jujucharms.com/charmstore/v5/glance-268/meta/any?include=bundle-machine-count&include=bundle-metadata&include=bundle-unit-count&include=bundles-containing&include=charm-config&include=charm-metadata&include=common-info&include=extra-info&include=owner&include=revision-info&include=published&include=stats&include=resources&include=supported-series&include=terms timeout: 5
ryan-beisner commented 6 years ago

The CS: timeout issues are prevalent in our CI as well. So to summarize: end users browsing jujucharms.com have a frequent error condition (0 items shown), conjure-up has a high failure rate for end users, my developers are often blocked merges which cannot complete a CI test. Please investigate the back-end of CS.

fnordahl commented 6 years ago

At present I am greeted with this error on charm search queries like keystone and ceph: jujucharms_com_search_timeout

ajkavanagh commented 6 years ago

I'm also greeted with this error:

image

Usually, refreshing solves the problem.

jcsackett commented 6 years ago

The culprit largely seems to be the stats component of queries on the charmstore. If possible omitting stats from the include in query params can speed things up considerably.

@fnordahl @ajkavanagh -- we are working on a deploy of jujucharms.com that should fix searching through the site, using the stats omission workaround.

@ajkavanagh @ryan-beisner @ChrisMacNaughton is the stats data necessary for your current uses? If not I would recommend removing it.

Charmstore folks are working on the issue as well, but in the meantime this approach could mitigate the problems.

ryan-beisner commented 6 years ago

@jaycee That library has been in place as-is, for better or for worse, for quite some time. What has changed that suddenly causes this to be an issue?

We can, should and will adjust the theblues library. However, it's a messy sprawl of impact. In order to actually profit, conjure-up and other projects will need to be rebuilt and redistributed with the modified theblues project. Also, other consumers of the theblues library will need to be modified to turn off stats, then and those modified projects will need to be redistributed/released.

ryan-beisner commented 6 years ago

FTR, bundletester, amulet also affected:

(func27-noop) ubuntu@beisner-bastion:~/git/lxd$ tests/gate-basic-bionic-queens 
2018-09-18 10:15:12,876 __init__ INFO: OpenStackAmuletDeployment:  init
2018-09-18 10:15:12,876 _add_services INFO: OpenStackAmuletDeployment:  adding services
2018-09-18 10:15:12,876 _determine_branch_locations INFO: OpenStackAmuletDeployment:  determine branch locations
ERROR:root:Request timed out: https://api.jujucharms.com/charmstore/v5/~openstack-charmers-next/bionic/nova-cloud-controller/meta/any?include=bundle-machine-count&include=bundle-metadata&include=bundle-unit-count&include=bundles-containing&include=charm-actions&include=charm-config&include=charm-metadata&include=common-info&include=extra-info&include=revision-info&include=stats&include=supported-series&include=manifest&include=tags&include=promulgated&include=perm&include=id timeout: 10.0
Traceback (most recent call last):
  File "tests/gate-basic-bionic-queens", line 24, in <module>
    stable=False)
  File "/home/ubuntu/git/lxd/tests/basic_deployment.py", line 57, in __init__
    self._add_services()
  File "/home/ubuntu/git/lxd/tests/basic_deployment.py", line 95, in _add_services
    other_services)
  File "/home/ubuntu/git/lxd/tests/charmhelpers/contrib/openstack/amulet/deployment.py", line 151, in _add_services
    other_services)
  File "/home/ubuntu/git/lxd/tests/charmhelpers/contrib/amulet/deployment.py", line 67, in _add_services
    constraints=svc.get('constraints'))
  File "/home/ubuntu/git/lxd/.tox/func27-noop/local/lib/python2.7/site-packages/amulet/deployer.py", line 218, in add
    service_name, charm, branch=branch, series=service['series'])
  File "/home/ubuntu/git/lxd/.tox/func27-noop/local/lib/python2.7/site-packages/amulet/charm.py", line 61, in fetch
    series=series)
  File "/home/ubuntu/git/lxd/.tox/func27-noop/local/lib/python2.7/site-packages/amulet/charm.py", line 46, in get_charm
    return Charm(with_series(charm_path, series))
  File "/home/ubuntu/git/lxd/.tox/func27-noop/local/lib/python2.7/site-packages/charmstore/lib.py", line 168, in __init__
    super(Charm, self).__init__(id, api=api, timeout=timeout)
  File "/home/ubuntu/git/lxd/.tox/func27-noop/local/lib/python2.7/site-packages/charmstore/lib.py", line 115, in __init__
    AVAILABLE_INCLUDES).get('Meta')
  File "/home/ubuntu/git/lxd/.tox/func27-noop/local/lib/python2.7/site-packages/theblues/charmstore.py", line 107, in _meta
    data = self._get(url)
  File "/home/ubuntu/git/lxd/.tox/func27-noop/local/lib/python2.7/site-packages/theblues/charmstore.py", line 78, in _get
    raise ServerError(message)
theblues.errors.ServerError: Request timed out: https://api.jujucharms.com/charmstore/v5/~openstack-charmers-next/bionic/nova-cloud-controller/meta/any?include=bundle-machine-count&include=bundle-metadata&include=bundle-unit-count&include=bundles-containing&include=charm-actions&include=charm-config&include=charm-metadata&include=common-info&include=extra-info&include=revision-info&include=stats&include=supported-series&include=manifest&include=tags&include=promulgated&include=perm&include=id timeout: 10.0
jcsackett commented 6 years ago

@ryan-beisner The stats component of the charmstore has always been slow; however, there's presently some issue going on with the underlying tools charmstore relies on causing a general slow down, which exacerbates the stats to the point that we're seeing these slowdowns.

theblues has been updated w/ @ChrisMacNaughton's fix: https://pypi.org/project/theblues/

Additionally, the fix is now live on jujucharms.com

ryan-beisner commented 6 years ago

@jaycee Even though theblues is released with the update, the default behavior of it was not changed. That means that we will have to go find all of the calls to theblues in intermediate modules such as amulet, bundletester, and other tools, modify them to tell theblues not to ask for stats, then release those individual intermediate tools.

FTR, we are still seeing prevalent CS timeouts overnight 2018 Sept 18 to 19.

jcsackett commented 6 years ago

I'm aware that the change is going to need to be made--as I understand it fixes are already underway for python-jujulib and amulet.

The underlying issues on CS are to my knowledge still being worked on--it requires a sizeable amount of work. I would encourage you to make the updates to use theblues b/c the CS fixes may take some time to rollout.

ryan-beisner commented 6 years ago

Indeed, we have all of that in flight. Just commenting here so passers-by understand that the theblues changes alone are not going to unblock devs and ci.

jcsackett commented 6 years ago

Ah, ok, I'm with you now. Thanks!

gnuoy commented 6 years ago

I'm still hitting >10s response times. You can see it in a openstack review https://openstack-ci-reports.ubuntu.com/artifacts/test_charm_pipeline_amulet_full/openstack/charm-aodh/603318/7/2056/test_charm_amulet_full_4171/amulet-full.txt . Also a curl command takes >10s:


$ time curl -H 'Cache-Control: no-cache'  'https://api.jujucharms.com/charmstore/v5/~openstack-charmers-next/bionic/ceph-mon/meta/any?include=bundle-machine-count&include=bundle-metadata&include=bundle-unit-count&include=bundles-containing&include=charm-actions&include=charm-config&include=charm-metadata&include=common-info&include=extra-info&include=revision-info&include=supported-series&include=manifest&include=tags&include=promulgated&include=perm' -D-  | grep -v ceph
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:10 --:--:--     0HTTP/1.1 200 OK
Date: Thu, 20 Sep 2018 05:56:20 GMT
Server: Apache/2.4.7 (Ubuntu)
 Access-Control-Allow-Credentials: true
 Access-Control-Allow-Headers: Bakery-Protocol-Version, Macaroons, X-Requested-With
0Access-Control-Allow-Methods: DELETE,GET,HEAD,PUT,POST,OPTIONS
 Access-Control-Allow-Origin: *
 Access-Control-Cache-Max-Age: 600
 Access-Control-Expose-Headers: WWW-Authenticate
 Content-Type: application/json
 X-Cache: MISS from juju-ps45-cdo-jujucharms-machine-27
0X-Cache-Lookup: MISS from juju-ps45-cdo-jujucharms-machine-27:3128
 Via: 1.1 juju-ps45-cdo-jujucharms-machine-27 (squid/3.3.8)
 Vary: Accept-Encoding
 Transfer-Encoding: chunked

100 97763    0 97763    0     0   8828      0 --:--:--  0:00:11 --:--:-- 26865

real    0m11.086s
user    0m0.021s
sys 0m0.012s```

fwiw subsequent hit a cache and are <2s. 
ryan-beisner commented 6 years ago

@jaycee ^ that is with stats disabled. We are still hitting prevalent CS timeouts. In an openstack deploy, with ~40 charms, all it takes is 1 charm query to fail to spoil an expensive and lengthy CI run. The result is that our devs cannot get code through our gate to land code currently.

ryan-beisner commented 5 years ago

screenshot from 2018-11-08 15-24-50

Hi All, the charm store is still timing out on the web UI, reporting 0 items fairly often. What is the status of the solution for this?