Ouranosinc / Magpie

AuthN/AuthZ services
https://pavics-magpie.readthedocs.io
Apache License 2.0
1 stars 5 forks source link

[BUG] Intermittent Twitcher/Magpie error when caching on (Magpie version 3.14.0) #466

Closed tlvu closed 3 years ago

tlvu commented 3 years ago

Describe the bug Many random hard to reproduce errors with several services behind Twitcher/Magpie when caching is turned on for Twitcher.

With caching not working, this issue https://github.com/bird-house/twitcher/issues/97 might have to be re-open?

To Reproduce Steps to reproduce the behavior:

  1. PAVICS at tag https://github.com/bird-house/birdhouse-deploy/tree/1.14.2 (PR https://github.com/bird-house/birdhouse-deploy/pull/188)
  2. pavics_thredds.ipynb, stress-test.ipynb in pavics-sdi repo failure
  3. Result

===Problem 1===

11:31:17  _____ pavics-sdi-master/docs/source/notebooks/pavics_thredds.ipynb::Cell 4 _____
11:31:17  Notebook cell execution failed
11:31:17  Cell 4: Cell execution caused an exception
11:31:17  
11:31:17  Input:
11:31:17  # NBVAL_IGNORE_OUTPUT
11:31:17  
11:31:17  AUTH_USR = os.getenv("TEST_MAGPIE_AUTHTEST_USERNAME", "authtest")
11:31:17  AUTH_PWD = os.getenv("TEST_MAGPIE_AUTHTEST_PASSWORD", "authtest1234")
11:31:17  
11:31:17  # Open session
11:31:17  with requests.Session() as session:
11:31:17      session.auth = MagpieAuth(f"https://{PAVICS_HOST}/magpie", AUTH_USR, AUTH_PWD)
11:31:17      # Open a PyDAP data store and pass it to xarray
11:31:17      store = xr.backends.PydapDataStore.open(SECURED_URL, session=session)
11:31:17      ds = xr.open_dataset(store, decode_cf=False)  # Attributes are problematic with this file.
11:31:17  ds
11:31:17  
11:31:17  Traceback:
11:31:17  
11:31:17  ---------------------------------------------------------------------------
11:31:17  HTTPError                                 Traceback (most recent call last)
11:31:17  /tmp/ipykernel_405/1184836852.py in <module>
11:31:17        9     # Open a PyDAP data store and pass it to xarray
11:31:17       10     store = xr.backends.PydapDataStore.open(SECURED_URL, session=session)
11:31:17  ---> 11     ds = xr.open_dataset(store, decode_cf=False)  # Attributes are problematic with this file.
11:31:17       12 ds
11:31:17  
11:31:17  /opt/conda/envs/birdy/lib/python3.7/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs)
11:31:17      499         drop_variables=drop_variables,
11:31:17      500         **decoders,
11:31:17  --> 501         **kwargs,
11:31:17      502     )
11:31:17      503     ds = _dataset_from_backend_dataset(
11:31:17  
11:31:17  /opt/conda/envs/birdy/lib/python3.7/site-packages/xarray/backends/store.py in open_dataset(self, store, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta)
11:31:17       37         )
11:31:17       38 
11:31:17  ---> 39         ds = Dataset(vars, attrs=attrs)
11:31:17       40         ds = ds.set_coords(coord_names.intersection(vars))
11:31:17       41         ds.set_close(store.close)
11:31:17  
11:31:17  /opt/conda/envs/birdy/lib/python3.7/site-packages/xarray/core/dataset.py in __init__(self, data_vars, coords, attrs)
11:31:17      753 
11:31:17      754         variables, coord_names, dims, indexes, _ = merge_data_and_coords(
11:31:17  --> 755             data_vars, coords, compat="broadcast_equals"
11:31:17      756         )
11:31:17      757 
11:31:17  
11:31:17  /opt/conda/envs/birdy/lib/python3.7/site-packages/xarray/core/merge.py in merge_data_and_coords(data, coords, compat, join)
11:31:17      483     indexes = dict(_extract_indexes_from_coords(coords))
11:31:17      484     return merge_core(
11:31:17  --> 485         objects, compat, join, explicit_coords=explicit_coords, indexes=indexes
11:31:17      486     )
11:31:17      487 
11:31:17  
11:31:17  /opt/conda/envs/birdy/lib/python3.7/site-packages/xarray/core/merge.py in merge_core(objects, compat, join, combine_attrs, priority_arg, explicit_coords, indexes, fill_value)
11:31:17      630         coerced, join=join, copy=False, indexes=indexes, fill_value=fill_value
11:31:17      631     )
11:31:17  --> 632     collected = collect_variables_and_indexes(aligned)
11:31:17      633 
11:31:17      634     prioritized = _get_priority_vars_and_indexes(aligned, priority_arg, compat=compat)
11:31:17  
11:31:17  /opt/conda/envs/birdy/lib/python3.7/site-packages/xarray/core/merge.py in collect_variables_and_indexes(list_of_mappings)
11:31:17      292                 append_all(coords, indexes)
11:31:17      293 
11:31:17  --> 294             variable = as_variable(variable, name=name)
11:31:17      295             if variable.dims == (name,):
11:31:17      296                 variable = variable.to_index_variable()
11:31:17  
11:31:17  /opt/conda/envs/birdy/lib/python3.7/site-packages/xarray/core/variable.py in as_variable(obj, name)
11:31:17      161                 "conflict with the coordinates used to label dimensions."
11:31:17      162             )
11:31:17  --> 163         obj = obj.to_index_variable()
11:31:17      164 
11:31:17      165     return obj
11:31:17  
11:31:17  /opt/conda/envs/birdy/lib/python3.7/site-packages/xarray/core/variable.py in to_index_variable(self)
11:31:17      532         """Return this variable as an xarray.IndexVariable"""
11:31:17      533         return IndexVariable(
11:31:17  --> 534             self.dims, self._data, self._attrs, encoding=self._encoding, fastpath=True
11:31:17      535         )
11:31:17      536 
11:31:17  
11:31:17  /opt/conda/envs/birdy/lib/python3.7/site-packages/xarray/core/variable.py in __init__(self, dims, data, attrs, encoding, fastpath)
11:31:17     2573         # Unlike in Variable, always eagerly load values into memory
11:31:17     2574         if not isinstance(self._data, PandasIndex):
11:31:17  -> 2575             self._data = PandasIndex(self._data)
11:31:17     2576 
11:31:17     2577     def __dask_tokenize__(self):
11:31:17  
11:31:17  /opt/conda/envs/birdy/lib/python3.7/site-packages/xarray/core/indexes.py in __init__(self, array, dtype, coord_name)
11:31:17      151         super().__init__(coord_name)
11:31:17      152 
11:31:17  --> 153         self.array = utils.safe_cast_to_index(array)
11:31:17      154 
11:31:17      155         if dtype is None:
11:31:17  
11:31:17  /opt/conda/envs/birdy/lib/python3.7/site-packages/xarray/core/utils.py in safe_cast_to_index(array)
11:31:17      116         if hasattr(array, "dtype") and array.dtype.kind == "O":
11:31:17      117             kwargs["dtype"] = object
11:31:17  --> 118         index = pd.Index(np.asarray(array), **kwargs)
11:31:17      119     return _maybe_cast_to_cftimeindex(index)
11:31:17      120 
11:31:17  
11:31:17  /opt/conda/envs/birdy/lib/python3.7/site-packages/xarray/core/indexing.py in __array__(self, dtype)
11:31:17      417     def __array__(self, dtype=None):
11:31:17      418         array = as_indexable(self.array)
11:31:17  --> 419         return np.asarray(array[self.key], dtype=None)
11:31:17      420 
11:31:17      421     def transpose(self, order):
11:31:17  
11:31:17  /opt/conda/envs/birdy/lib/python3.7/site-packages/xarray/conventions.py in __getitem__(self, key)
11:31:17       60 
11:31:17       61     def __getitem__(self, key):
11:31:17  ---> 62         return np.asarray(self.array[key], dtype=self.dtype)
11:31:17       63 
11:31:17       64 
11:31:17  
11:31:17  /opt/conda/envs/birdy/lib/python3.7/site-packages/xarray/core/indexing.py in __array__(self, dtype)
11:31:17      417     def __array__(self, dtype=None):
11:31:17      418         array = as_indexable(self.array)
11:31:17  --> 419         return np.asarray(array[self.key], dtype=None)
11:31:17      420 
11:31:17      421     def transpose(self, order):
11:31:17  
11:31:17  /opt/conda/envs/birdy/lib/python3.7/site-packages/xarray/backends/pydap_.py in __getitem__(self, key)
11:31:17       38     def __getitem__(self, key):
11:31:17       39         return indexing.explicit_indexing_adapter(
11:31:17  ---> 40             key, self.shape, indexing.IndexingSupport.BASIC, self._getitem
11:31:17       41         )
11:31:17       42 
11:31:17  
11:31:17  /opt/conda/envs/birdy/lib/python3.7/site-packages/xarray/core/indexing.py in explicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method)
11:31:17      708     """
11:31:17      709     raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support)
11:31:17  --> 710     result = raw_indexing_method(raw_key.tuple)
11:31:17      711     if numpy_indices.tuple:
11:31:17      712         # index the loaded np.ndarray
11:31:17  
11:31:17  /opt/conda/envs/birdy/lib/python3.7/site-packages/xarray/backends/pydap_.py in _getitem(self, key)
11:31:17       45         # downloading coordinate data twice
11:31:17       46         array = getattr(self.array, "array", self.array)
11:31:17  ---> 47         result = robust_getitem(array, key, catch=ValueError)
11:31:17       48         # in some cases, pydap doesn't squeeze axes automatically like numpy
11:31:17       49         axis = tuple(n for n, k in enumerate(key) if isinstance(k, integer_types))
11:31:17  
11:31:17  /opt/conda/envs/birdy/lib/python3.7/site-packages/xarray/backends/common.py in robust_getitem(array, key, catch, max_retries, initial_delay)
11:31:17       63     for n in range(max_retries + 1):
11:31:17       64         try:
11:31:17  ---> 65             return array[key]
11:31:17       66         except catch:
11:31:17       67             if n == max_retries:
11:31:17  
11:31:17  /opt/conda/envs/birdy/lib/python3.7/site-packages/pydap/model.py in __getitem__(self, index)
11:31:17      318     def __getitem__(self, index):
11:31:17      319         out = copy.copy(self)
11:31:17  --> 320         out.data = self._get_data_index(index)
11:31:17      321         return out
11:31:17      322 
11:31:17  
11:31:17  /opt/conda/envs/birdy/lib/python3.7/site-packages/pydap/model.py in _get_data_index(self, index)
11:31:17      347             return np.vectorize(decode_np_strings)(self._data[index])
11:31:17      348         else:
11:31:17  --> 349             return self._data[index]
11:31:17      350 
11:31:17      351     def _get_data(self):
11:31:17  
11:31:17  /opt/conda/envs/birdy/lib/python3.7/site-packages/pydap/handlers/dap.py in __getitem__(self, index)
11:31:17      140         logger.info("Fetching URL: %s" % url)
11:31:17      141         r = GET(url, self.application, self.session, timeout=self.timeout)
11:31:17  --> 142         raise_for_status(r)
11:31:17      143         dds, data = r.body.split(b'\nData:\n', 1)
11:31:17      144         dds = dds.decode(r.content_encoding or 'ascii')
11:31:17  
11:31:17  /opt/conda/envs/birdy/lib/python3.7/site-packages/pydap/net.py in raise_for_status(response)
11:31:17       37             detail=response.status+'\n'+response.text,
11:31:17       38             headers=response.headers,
11:31:17  ---> 39             comment=response.body
11:31:17       40         )
11:31:17       41 
11:31:17  
11:31:17  HTTPError: 500 Internal Server Error
11:31:17  <?xml version="1.0" encoding="utf-8"?>
11:31:17  <ExceptionReport version="1.0.0"
11:31:17      xmlns="http://www.opengis.net/ows/1.1"
11:31:17      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
11:31:17      xsi:schemaLocation="http://www.opengis.net/ows/1.1 http://schemas.opengis.net/ows/1.1.0/owsExceptionReport.xsd">
11:31:17      <Exception exceptionCode="NoApplicableCode" locator="NoApplicableCode">
11:31:17          <ExceptionText>Instance &lt;Directory at 0x7f7bf8ac8f90&gt; is not bound to a Session; attribute refresh operation cannot proceed (Background on this error at: http://sqlalche.me/e/13/bhk3)</ExceptionText>
11:31:17      </Exception>
11:31:17  </ExceptionReport>

Matching twitcher server-side logs:

2021-09-09 15:24:02,563 ERROR [TWITCHER:33][waitress-0] unknown error
Traceback (most recent call last):
  File "/opt/birdhouse/src/twitcher/twitcher/tweens.py", line 27, in ows_security_tween
    security.check_request(request)
  File "/opt/local/src/magpie/magpie/adapter/magpieowssecurity.py", line 163, in check_request
    has_permission = authz_policy.permits(service_impl, principals, permission_requested)
  File "/usr/local/lib/python3.7/site-packages/pyramid/authorization.py", line 74, in permits
    acl = location.__acl__
  File "/opt/local/src/magpie/magpie/services.py", line 175, in __acl__
    acl = self._get_acl_cached(*cache_keys)
  File "/usr/local/lib/python3.7/site-packages/beaker/cache.py", line 601, in cached
    return cache[0].get_value(cache_key, createfunc=go)
  File "/usr/local/lib/python3.7/site-packages/beaker/cache.py", line 322, in get
    return self._get_value(key, **kw).get_value()
  File "/usr/local/lib/python3.7/site-packages/beaker/container.py", line 380, in get_value
    v = self.createfunc()
  File "/usr/local/lib/python3.7/site-packages/beaker/cache.py", line 597, in go
    return func(*args, **kwargs)
  File "/opt/local/src/magpie/magpie/services.py", line 213, in _get_acl_cached
    return self._get_acl(user, resource, permissions, allow_match=is_target)
  File "/opt/local/src/magpie/magpie/services.py", line 223, in _get_acl
    permissions = self.effective_permissions(user, resource, permissions, allow_match)
  File "/opt/local/src/magpie/magpie/services.py", line 328, in effective_permissions
    cur_res_perms = ResourceService.perms_for_user(resource, user, db_session=db_session)
  File "/usr/local/lib/python3.7/site-packages/ziggurat_foundations/models/services/resource.py", line 86, in perms_for_user
    if instance.owner_user_id == user.id:
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/attributes.py", line 294, in __get__
    return self.impl.get(instance_state(instance), dict_)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/attributes.py", line 725, in get\
    value = state._load_expired(state, passive)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/state.py", line 652, in _load_expired
    self.manager.deferred_scalar_loader(self, toload)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/loading.py", line 944, in load_scalar_attributes
    "attribute refresh operation cannot proceed" % (state_str(state))
sqlalchemy.orm.exc.DetachedInstanceError: Instance <Directory at 0x7f7bf8ac8f90> is not bound to a Session; attribute refresh operation cannot proceed (Background on this error at: http://sqlalche.me/e/13/bhk3)

===Problem 2===

This access denied error should never happend, no logs on twitcher server side

20:43:35  _____ pavics-sdi-master/docs/source/notebooks/pavics_thredds.ipynb::Cell 4 _____
20:43:35  Notebook cell execution failed
20:43:35  Cell 4: Cell execution caused an exception
20:43:35  
20:43:35  Input:
20:43:35  # NBVAL_IGNORE_OUTPUT
20:43:35  
20:43:35  AUTH_USR = os.getenv("TEST_MAGPIE_AUTHTEST_USERNAME", "authtest")
20:43:35  AUTH_PWD = os.getenv("TEST_MAGPIE_AUTHTEST_PASSWORD", "authtest1234")
20:43:35  
20:43:35  # Open session
20:43:35  with requests.Session() as session:
20:43:35      session.auth = MagpieAuth(f"https://{PAVICS_HOST}/magpie", AUTH_USR, AUTH_PWD)
20:43:35      # Open a PyDAP data store and pass it to xarray
20:43:35      store = xr.backends.PydapDataStore.open(SECURED_URL, session=session)
20:43:35      ds = xr.open_dataset(store, decode_cf=False)  # Attributes are problematic with this file.
20:43:35  ds
20:43:35  
20:43:35  Traceback:
20:43:35  
20:43:35  ---------------------------------------------------------------------------
20:43:35  HTTPError                                 Traceback (most recent call last)
20:43:35  <ipython-input-5-fad2ffe5f696> in <module>
20:43:35        8     session.auth = MagpieAuth(f"https://{PAVICS_HOST}/magpie", AUTH_USR, AUTH_PWD)
20:43:35        9     # Open a PyDAP data store and pass it to xarray
20:43:35  ---> 10     store = xr.backends.PydapDataStore.open(SECURED_URL, session=session)
20:43:35       11     ds = xr.open_dataset(store, decode_cf=False)  # Attributes are problematic with this file.
20:43:35       12 ds
20:43:35  
20:43:35  /opt/conda/envs/birdy/lib/python3.7/site-packages/xarray/backends/pydap_.py in open(cls, url, session)
20:43:35       91     def open(cls, url, session=None):
20:43:35       92 
20:43:35  ---> 93         ds = pydap.client.open_url(url, session=session)
20:43:35       94         return cls(ds)
20:43:35       95 
20:43:35  
20:43:35  /opt/conda/envs/birdy/lib/python3.7/site-packages/pydap/client.py in open_url(url, application, session, output_grid, timeout)
20:43:35       65     """
20:43:35       66     dataset = DAPHandler(url, application, session, output_grid,
20:43:35  ---> 67                          timeout).dataset
20:43:35       68 
20:43:35       69     # attach server-side functions
20:43:35  
20:43:35  /opt/conda/envs/birdy/lib/python3.7/site-packages/pydap/handlers/dap.py in __init__(self, url, application, session, output_grid, timeout)
20:43:35       52         ddsurl = urlunsplit((scheme, netloc, path + '.dds', query, fragment))
20:43:35       53         r = GET(ddsurl, application, session, timeout=timeout)
20:43:35  ---> 54         raise_for_status(r)
20:43:35       55         if not r.charset:
20:43:35       56             r.charset = 'ascii'
20:43:35  
20:43:35  /opt/conda/envs/birdy/lib/python3.7/site-packages/pydap/net.py in raise_for_status(response)
20:43:35       37             detail=response.status+'\n'+response.text,
20:43:35       38             headers=response.headers,
20:43:35  ---> 39             comment=response.body
20:43:35       40         )
20:43:35       41 
20:43:35  
20:43:35  HTTPError: 403 Forbidden
20:43:35  <?xml version="1.0" encoding="utf-8"?>
20:43:35  <ExceptionReport version="1.0.0"
20:43:35      xmlns="http://www.opengis.net/ows/1.1"
20:43:35      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
20:43:35      xsi:schemaLocation="http://www.opengis.net/ows/1.1 http://schemas.opengis.net/ows/1.1.0/owsExceptionReport.xsd">
20:43:35      <Exception exceptionCode="NoApplicableCode" locator="AccessForbidden">
20:43:35          <ExceptionText>Not authorized to access this resource. User does not meet required permissions.</ExceptionText>
20:43:35      </Exception>
20:43:35  </ExceptionReport>

===Problem 3===

408 Error in stress-test.ipynb reproduced on CRIM Jenkins as well https://daccs-jenkins.crim.ca/job/PAVICS-e2e-workflow-tests/job/master/570/console (all errors in stress-test.ipynb was gone for 3 weeks when caching was turned off, see https://github.com/Ouranosinc/Magpie/issues/433).

===Problem 4===

"Remote end closed connection without response" without any errors in Twitcher logs:

14:31:47  ___ pavics-sdi-master/docs/source/notebooks/subset-user-input.ipynb::Cell 14 ___
14:31:47  Notebook cell execution failed
14:31:47  Cell 14: Cell execution caused an exception
14:31:47  
14:31:47  Input:
14:31:47  lon0 = float(bounds.minx)
14:31:47  lon1 = float(bounds.maxx)
14:31:47  lat0 = float(bounds.miny)
14:31:47  lat1 = float(bounds.maxy)
14:31:47  
14:31:47  result_tasmin = finch.subset_bbox(
14:31:47      resource=data,
14:31:47      variable="tasmin",
14:31:47      lon0=lon0,
14:31:47      lon1=lon1,
14:31:47      lat0=lat0,
14:31:47      lat1=lat1,
14:31:47      start_date="1958-01-01",
14:31:47      end_date="1958-12-31",
14:31:47  )
14:31:47  
14:31:47  Traceback:
14:31:47  
14:31:47  ---------------------------------------------------------------------------
14:31:47  ServiceException                          Traceback (most recent call last)
14:31:47  /tmp/ipykernel_626/857013960.py in <module>
14:31:47       12     lat1=lat1,
14:31:47       13     start_date="1958-01-01",
14:31:47  ---> 14     end_date="1958-12-31",
14:31:47       15 )
14:31:47  
14:31:47  </opt/conda/envs/birdy/lib/python3.7/site-packages/birdy/client/base.py-335> in subset_bbox(self, resource, lon0, lon1, lat0, lat1, start_date, end_date, variable)
14:31:47  
14:31:47  /opt/conda/envs/birdy/lib/python3.7/site-packages/birdy/client/base.py in _execute(self, pid, **kwargs)
14:31:47      371                 output=wps_outputs,
14:31:47      372                 mode=mode,
14:31:47  --> 373                 lineage=self._lineage,
14:31:47      374             )
14:31:47      375 
14:31:47  
14:31:47  /opt/conda/envs/birdy/lib/python3.7/site-packages/owslib/wps.py in execute(self, identifier, inputs, output, mode, lineage, request, response)
14:31:47      357         # submit the request to the live server
14:31:47      358         if response is None:
14:31:47  --> 359             response = execution.submitRequest(request)
14:31:47      360         else:
14:31:47      361             response = etree.fromstring(response)
14:31:47  
14:31:47  /opt/conda/envs/birdy/lib/python3.7/site-packages/owslib/wps.py in submitRequest(self, request)
14:31:47      920         reader = WPSExecuteReader(verbose=self.verbose, timeout=self.timeout, auth=self.auth)
14:31:47      921         response = reader.readFromUrl(
14:31:47  --> 922             self.url, request, method='Post', headers=self.headers)
14:31:47      923         self.response = response
14:31:47      924         return response
14:31:47  
14:31:47  /opt/conda/envs/birdy/lib/python3.7/site-packages/owslib/wps.py in readFromUrl(self, url, data, method, username, password, headers, verify, cert)
14:31:47      601 
14:31:47      602         return self._readFromUrl(url, data, self.timeout, method, username=username, password=password,
14:31:47  --> 603                                  headers=headers, verify=verify, cert=cert)
14:31:47      604 
14:31:47      605 
14:31:47  
14:31:47  /opt/conda/envs/birdy/lib/python3.7/site-packages/owslib/wps.py in _readFromUrl(self, url, data, timeout, method, username, password, headers, verify, cert)
14:31:47      513             u = openURL(url, data, method='Post',
14:31:47      514                         username=self.auth.username, password=self.auth.password,
14:31:47  --> 515                         headers=headers, verify=self.auth.verify, cert=self.auth.cert, timeout=timeout)
14:31:47      516             return etree.fromstring(u.read())
14:31:47      517 
14:31:47  
14:31:47  /opt/conda/envs/birdy/lib/python3.7/site-packages/owslib/util.py in openURL(url_base, data, method, cookies, username, password, timeout, headers, verify, cert, auth)
14:31:47      209 
14:31:47      210     if req.status_code in [400, 401]:
14:31:47  --> 211         raise ServiceException(req.text)
14:31:47      212 
14:31:47      213     if req.status_code in [404, 500, 502, 503, 504]:    # add more if needed
14:31:47  
14:31:47  ServiceException: <?xml version="1.0" encoding="utf-8"?>
14:31:47  <ExceptionReport version="1.0.0"
14:31:47      xmlns="http://www.opengis.net/ows/1.1"
14:31:47      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
14:31:47      xsi:schemaLocation="http://www.opengis.net/ows/1.1 http://schemas.opengis.net/ows/1.1.0/owsExceptionReport.xsd">
14:31:47      <Exception exceptionCode="NoApplicableCode" locator="NotAcceptable">
14:31:47          <ExceptionText>Request failed: (&#x27;Connection aborted.&#x27;, RemoteDisconnected(&#x27;Remote end closed connection without response&#x27;))</ExceptionText>
14:31:47      </Exception>
14:31:47  </ExceptionReport>
  1. Tried many ways to enhanced reproduciblity without any giving better results
    • Tried to run the test right after Twitcher/Magpie restart
    • Tried to have 2 concurrents runs at once
    • Tried to have back to back runs
    • Tried to wait 20 - 30 mins between runs
fmigneault commented 3 years ago

Test with Magpie Auth targetting Thredds https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/blob/master/notebooks-auth/test_thredds.ipynb

Stress test to loop requests: https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/blob/master/notebooks/stress-tests.ipynb

The Magpie authentication/session creation step from test_thredds would need to be added in order for requests within stress-test to employ the same resolution and resources (THREDDS dirs/file).

tlvu commented 3 years ago

Another stacktrace when reverting to Waitress with newer Magpie + debug https://github.com/bird-house/birdhouse-deploy/pull/197

Traceback (most recent call last):
  File "/opt/birdhouse/src/twitcher/twitcher/tweens.py", line 27, in ows_security_tween
    security.check_request(request)
  File "/opt/local/src/magpie/magpie/adapter/magpieowssecurity.py", line 171, in check_request
    has_permission = authz_policy.permits(service_impl, principals, permission_requested)
  File "/usr/local/lib/python3.7/site-packages/pyramid/authorization.py", line 74, in permits
    acl = location.__acl__
  File "/opt/local/src/magpie/magpie/services.py", line 183, in __acl__
    acl = self._get_acl_cached(*cache_keys)
  File "/usr/local/lib/python3.7/site-packages/beaker/cache.py", line 601, in cached
    return cache[0].get_value(cache_key, createfunc=go)
  File "/usr/local/lib/python3.7/site-packages/beaker/cache.py", line 322, in get
    return self._get_value(key, **kw).get_value()
  File "/usr/local/lib/python3.7/site-packages/beaker/container.py", line 380, in get_value
    v = self.createfunc()
  File "/usr/local/lib/python3.7/site-packages/beaker/cache.py", line 597, in go
    return func(*args, **kwargs)
  File "/opt/local/src/magpie/magpie/services.py", line 220, in _get_acl_cached
    user = self.user_requested()
  File "/opt/local/src/magpie/magpie/services.py", line 157, in user_requested
    user = UserService.by_user_name(anonymous, db_session=self.request.db)
  File "/usr/local/lib/python3.7/site-packages/ziggurat_foundations/models/services/user.py", line 330, in by_user_name
    return query.first()
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3429, in first
    ret = list(self[0:1])
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3203, in __getitem__
    return list(res)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3535, in __iter__
    return self._execute_and_instances(context)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3557, in _execute_and_instances
    querycontext, self._connection_from_session, close_with_result=True
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3572, in _get_bind_args
    mapper=self._bind_mapper(), clause=querycontext.statement, **kw
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3550, in _connection_from_session
    conn = self.session.connection(**kw)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 1145, in connection
    execution_options=execution_options,
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 1151, in _connection_for_bind
    engine, execution_options
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 458, in _connection_for_bind
    self.session.dispatch.after_begin(self.session, self, conn)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/event/attr.py", line 322, in __call__
    fn(*args, **kw)
  File "/usr/local/lib/python3.7/site-packages/zope/sqlalchemy/datamanager.py", line 269, in after_begin
    session, self.initial_state, self.transaction_manager, self.keep_session
  File "/usr/local/lib/python3.7/site-packages/zope/sqlalchemy/datamanager.py", line 234, in join_transaction
    session, initial_state, transaction_manager, keep_session=keep_session
  File "/usr/local/lib/python3.7/site-packages/zope/sqlalchemy/datamanager.py", line 89, in __init__
    transaction_manager.get().join(self)
  File "/usr/local/lib/python3.7/site-packages/transaction/_manager.py", line 90, in get
    raise NoTransaction()
transaction.interfaces.NoTransaction

Full logs twitcher-cache-error-with-waitress.txt

tlvu commented 3 years ago

Same stacktrace above with gunicorn as well, matching this notebook error:

12:55:07  _____ pavics-sdi-master/docs/source/notebooks/pavics_thredds.ipynb::Cell 4 _____
12:55:07  Notebook cell execution failed
12:55:07  Cell 4: Cell execution caused an exception
12:55:07  
12:55:07  Input:
12:55:07  # NBVAL_IGNORE_OUTPUT
12:55:07  
12:55:07  AUTH_USR = os.getenv("TEST_MAGPIE_AUTHTEST_USERNAME", "authtest")
12:55:07  AUTH_PWD = os.getenv("TEST_MAGPIE_AUTHTEST_PASSWORD", "authtest1234")
12:55:07  
12:55:07  # Open session
12:55:07  with requests.Session() as session:
12:55:07      session.auth = MagpieAuth(f"https://{PAVICS_HOST}/magpie", AUTH_USR, AUTH_PWD)
12:55:07      # Open a PyDAP data store and pass it to xarray
12:55:07      store = xr.backends.PydapDataStore.open(SECURED_URL, session=session)
12:55:07      ds = xr.open_dataset(store, decode_cf=False)  # Attributes are problematic with this file.
12:55:07  ds
12:55:07  
12:55:07  Traceback:
12:55:07  
12:55:07  ---------------------------------------------------------------------------
12:55:07  HTTPError                                 Traceback (most recent call last)
12:55:07  /tmp/ipykernel_429/1184836852.py in <module>
12:55:07        8     session.auth = MagpieAuth(f"https://{PAVICS_HOST}/magpie", AUTH_USR, AUTH_PWD)
12:55:07        9     # Open a PyDAP data store and pass it to xarray
12:55:07  ---> 10     store = xr.backends.PydapDataStore.open(SECURED_URL, session=session)
12:55:07       11     ds = xr.open_dataset(store, decode_cf=False)  # Attributes are problematic with this file.
12:55:07       12 ds
12:55:07  
12:55:07  /opt/conda/envs/birdy/lib/python3.7/site-packages/xarray/backends/pydap_.py in open(cls, url, session)
12:55:07       91     def open(cls, url, session=None):
12:55:07       92 
12:55:07  ---> 93         ds = pydap.client.open_url(url, session=session)
12:55:07       94         return cls(ds)
12:55:07       95 
12:55:07  
12:55:07  /opt/conda/envs/birdy/lib/python3.7/site-packages/pydap/client.py in open_url(url, application, session, output_grid, timeout)
12:55:07       65     """
12:55:07       66     dataset = DAPHandler(url, application, session, output_grid,
12:55:07  ---> 67                          timeout).dataset
12:55:07       68 
12:55:07       69     # attach server-side functions
12:55:07  
12:55:07  /opt/conda/envs/birdy/lib/python3.7/site-packages/pydap/handlers/dap.py in __init__(self, url, application, session, output_grid, timeout)
12:55:07       52         ddsurl = urlunsplit((scheme, netloc, path + '.dds', query, fragment))
12:55:07       53         r = GET(ddsurl, application, session, timeout=timeout)
12:55:07  ---> 54         raise_for_status(r)
12:55:07       55         if not r.charset:
12:55:07       56             r.charset = 'ascii'
12:55:07  
12:55:07  /opt/conda/envs/birdy/lib/python3.7/site-packages/pydap/net.py in raise_for_status(response)
12:55:07       37             detail=response.status+'\n'+response.text,
12:55:07       38             headers=response.headers,
12:55:07  ---> 39             comment=response.body
12:55:07       40         )
12:55:07       41 
12:55:07  
12:55:07  HTTPError: 500 Internal Server Error
12:55:07  <?xml version="1.0" encoding="utf-8"?>
12:55:07  <ExceptionReport version="1.0.0"
12:55:07      xmlns="http://www.opengis.net/ows/1.1"
12:55:07      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
12:55:07      xsi:schemaLocation="http://www.opengis.net/ows/1.1 http://schemas.opengis.net/ows/1.1.0/owsExceptionReport.xsd">
12:55:07      <Exception exceptionCode="NoApplicableCode" locator="NoApplicableCode">
12:55:07          <ExceptionText>Unknown Error</ExceptionText>
12:55:07      </Exception>
12:55:07  </ExceptionReport>

Plus this run has a new bunch of 408 in stress-test.ipynb and I even got them on CRIM's Jenkins https://daccs-jenkins.crim.ca/job/PAVICS-e2e-workflow-tests/job/master/606/console

Full log twitcher-cache-error-with-gunicorn.txt.gz

@fmigneault I hope you have all the debug infos you need in the 2 logs I posted. We still have problem with both waitress and gunicorn.

Maybe this only happen in a full PAVICS env. You should try to reproduce on your side in a full PAVICs env.