datopian / metastore-lib

🗄️ Library for storing dataset metadata, with versioning support and pluggable backends including GitHub.
https://tech.datopian.com/versioning/
MIT License
10 stars 1 forks source link

Metastore lib fails when trying to create new harvester #19

Open zelima opened 4 years ago

zelima commented 4 years ago

I'm trying creating harvester on POC that is running on K8s with LFS server. But getting following after clicking save button

2020-07-23 06:52:39,062 INFO  [ckanext.harvest.plugin] [DummyThread-107] Creating harvest source: {'name': u'test-harvest', 'title': u'test-harvest', 'url': u'https://master.ckan.org', 'type': u'harvest', 'notes': u'sdsfs', 'owner_org': u'0965800a-d2bf-4ed2-8e15-9b6e4dddc09f', 'source_type': u'ckan', 'frequency': u'MANUAL', 'creator_user_id': u'71bb0387-6dae-4f79-a4d1-ed50140190a3', 'config': u'', 'id': '308927f1-ac0f-424d-bdaa-b85eebfea4c3', 'extras': [{'value': u'', 'key': 'config'}, {'value': u'MANUAL', 'key': 'frequency'}, {'value': u'ckan', 'key': 'source_type'}]}
2020-07-23 06:52:39,063 INFO  [ckanext.harvest.plugin] [DummyThread-107] Harvest source created: 308927f1-ac0f-424d-bdaa-b85eebfea4c3
2020-07-23 06:52:39,465 INFO  [ckan.lib.base] [DummyThread-107]  /harvest/new render time 0.516 seconds
[2020-07-23 06:52:39 +0000] [124] [DEBUG] GET /harvest/test-harvest
Error - <class 'metastore.backend.exc.NotFound'>: ('Could not find package {}', u'test-harvest')
URL: http://dataexchange-poc.gatesfoundation.org/harvest/test-harvest
File '/usr/local/lib/python2.7/dist-packages/weberror/errormiddleware.py', line 171 in __call__
  app_iter = self.application(environ, sr_checker)
File '/usr/local/lib/python2.7/dist-packages/webob/dec.py', line 147 in __call__
  resp = self.call_func(req, *args, **self.kwargs)
File '/usr/local/lib/python2.7/dist-packages/webob/dec.py', line 208 in call_func
  return self.func(req, *args, **kwargs)
File '/usr/local/lib/python2.7/dist-packages/fanstatic/publisher.py', line 234 in __call__
  return request.get_response(self.app)
File '/usr/local/lib/python2.7/dist-packages/webob/request.py', line 1053 in get_response
  application, catch_exc_info=False)
File '/usr/local/lib/python2.7/dist-packages/webob/request.py', line 1022 in call_application
  app_iter = application(self.environ, start_response)
File '/usr/local/lib/python2.7/dist-packages/webob/dec.py', line 147 in __call__
  resp = self.call_func(req, *args, **self.kwargs)
File '/usr/local/lib/python2.7/dist-packages/webob/dec.py', line 208 in call_func
  return self.func(req, *args, **kwargs)
File '/usr/local/lib/python2.7/dist-packages/fanstatic/injector.py', line 54 in __call__
  response = request.get_response(self.app)
File '/usr/local/lib/python2.7/dist-packages/webob/request.py', line 1053 in get_response
  application, catch_exc_info=False)
File '/usr/local/lib/python2.7/dist-packages/webob/request.py', line 1022 in call_application
  app_iter = application(self.environ, start_response)
File '/usr/lib/ckan/src/ckan/ckan/config/middleware/pylons_app.py', line 262 in inner
  result = application(environ, start_response)
File '/usr/local/lib/python2.7/dist-packages/beaker/middleware.py', line 73 in __call__
  return self.app(environ, start_response)
File '/usr/local/lib/python2.7/dist-packages/beaker/middleware.py', line 156 in __call__
  return self.wrap_app(environ, session_start_response)
File '/usr/local/lib/python2.7/dist-packages/routes/middleware.py', line 131 in __call__
  response = self.app(environ, start_response)
File '/usr/lib/ckan/src/ckan/ckan/config/middleware/common_middleware.py', line 30 in __call__
  return self.app(environ, start_response)
File '/usr/lib/ckan/src/ckan/ckan/config/middleware/common_middleware.py', line 56 in __call__
  return self.app(environ, start_response)
File '/usr/local/lib/python2.7/dist-packages/pylons/wsgiapp.py', line 125 in __call__
  response = self.dispatch(controller, environ, start_response)
File '/usr/local/lib/python2.7/dist-packages/pylons/wsgiapp.py', line 324 in dispatch
  return controller(environ, start_response)
File '/usr/lib/ckan/src/ckan/ckan/lib/base.py', line 240 in __call__
  res = WSGIController.__call__(self, environ, start_response)
File '/usr/local/lib/python2.7/dist-packages/pylons/controllers/core.py', line 221 in __call__
  response = self._dispatch_call()
File '/usr/local/lib/python2.7/dist-packages/pylons/controllers/core.py', line 172 in _dispatch_call
  response = self._inspect_call(func)
File '/usr/local/lib/python2.7/dist-packages/pylons/controllers/core.py', line 107 in _inspect_call
  result = self._perform_call(func, args)
File '/usr/local/lib/python2.7/dist-packages/pylons/controllers/core.py', line 60 in _perform_call
  return func(**args)
File '/usr/lib/ckan/src/ckan/ckan/controllers/package.py', line 386 in read
  c.pkg_dict = get_action('package_show')(context, data_dict)
File '/usr/lib/ckan/src/ckan/ckan/logic/__init__.py', line 466 in wrapped
  result = _action(context, data_dict, **kw)
File '/usr/local/lib/python2.7/dist-packages/ckanext/versioning/logic/action.py', line 222 in package_show_revision
  result = core_package_show(context, data_dict)
File '/usr/lib/ckan/src/ckan/ckan/logic/action/get.py', line 1032 in package_show
  package_dict = item.before_view(package_dict)
File '/usr/local/lib/python2.7/dist-packages/ckanext/versioning/plugin.py', line 78 in before_view
  {"dataset": pkg_dict['id']}
File '/usr/local/lib/python2.7/dist-packages/ckanext/versioning/logic/action.py', line 158 in dataset_tag_list
  tag_list = backend.tag_list(dataset.name)
File '/usr/local/lib/python2.7/dist-packages/metastore/backend/github/storage.py', line 143 in tag_list
  repo = self._get_repo(package_id)
File '/usr/local/lib/python2.7/dist-packages/metastore/backend/github/storage.py', line 219 in _get_repo
  raise exc.NotFound('Could not find package {}', package_id)
NotFound: ('Could not find package {}', u'test-harvest')

CGI Variables
-------------
  AUTH_TYPE: 'cookie'
  CKAN_CURRENT_URL: '/harvest/test-harvest'
  CKAN_LANG: 'en'
  CKAN_LANG_IS_DEFAULT: True
  HTTP_ACCEPT: 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'
  HTTP_ACCEPT_ENCODING: 'gzip, deflate, br'
  HTTP_ACCEPT_LANGUAGE: 'en-GB,en-US;q=0.9,en;q=0.8'
  HTTP_CACHE_CONTROL: 'max-age=0'
  HTTP_COOKIE: 'sfbShellAppLaunched=yes; clientchoice=DesktopClient; _ga=GA1.2.1350896404.1580983419; _gid=GA1.2.1137990824.1595321762; auth_tkt="87ba178c634a28db8a2931872a14329c5f1925372%3Durn%253Aoasis%253Anames%253Atc%253ASAML%253A1.1%253Anameid-format%253AemailAddress%2C4%3DIrakli.Mchedlishvili%2540gatesfoundation.org!"; auth_tkt="87ba178c634a28db8a2931872a14329c5f1925372%3Durn%253Aoasis%253Anames%253Atc%253ASAML%253A1.1%253Anameid-format%253AemailAddress%2C4%3DIrakli.Mchedlishvili%2540gatesfoundation.org!"; ckan=18a7547cda72669eceb5c2f725dbd12c62e00e3e13a909095e414e32914bc8d9c461e708'
  HTTP_HOST: 'dataexchange-poc.gatesfoundation.org'
  HTTP_REFERER: 'https://dataexchange-poc.gatesfoundation.org/harvest/new'
  HTTP_SEC_FETCH_DEST: 'document'
  HTTP_SEC_FETCH_MODE: 'navigate'
  HTTP_SEC_FETCH_SITE: 'same-origin'
  HTTP_SEC_FETCH_USER: '?1'
  HTTP_UPGRADE_INSECURE_REQUESTS: '1'
  HTTP_USER_AGENT: 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36'
  HTTP_X_FORWARDED_FOR: '10.0.0.35'
  HTTP_X_FORWARDED_HOST: 'dataexchange-poc.gatesfoundation.org'
  HTTP_X_FORWARDED_PORT: '443'
  HTTP_X_FORWARDED_PREFIX: '/giftless'
  HTTP_X_FORWARDED_PROTO: 'https'
  HTTP_X_REAL_IP: '10.0.0.35'
  HTTP_X_REQUEST_ID: '329e97cc40d3ab0663f20d5cd3cea541'
  HTTP_X_SCHEME: 'https'
  PATH_INFO: '/harvest/test-harvest'
  RAW_URI: '/harvest/test-harvest'
  REMOTE_ADDR: '10.0.0.101'
  REMOTE_PORT: '35596'
  REMOTE_USER: '2=urn%3Aoasis%3Anames%3Atc%3ASAML%3A1.1%3Anameid-format%3AemailAddress,4=Irakli.Mchedlishvili%40gatesfoundation.org'
  REMOTE_USER_TOKENS: ['']
  REQUEST_METHOD: 'GET'
  SERVER_NAME: '0.0.0.0'
  SERVER_PORT: '5000'
  SERVER_PROTOCOL: 'HTTP/1.1'
  SERVER_SOFTWARE: 'gunicorn/19.9.0'

WSGI Variables
--------------
  application: <fanstatic.publisher.Delegator object at 0x7ff193fe5290>
  beaker.cache: <beaker.cache.CacheManager object at 0x7ff193fe51d0>
  beaker.get_session: <bound method SessionMiddleware._get_session of <beaker.middleware.SessionMiddleware object at 0x7ff193fe5150>>
  beaker.session: {'_accessed_time': 1595487160.944992, '_creation_time': 1595486616.879004}
  ckan.app: 'pylons_app'
  fanstatic.needed: <fanstatic.core.NeededResources object at 0x7ff18ff6c1d0>
  gunicorn.socket: <socket at 0x7ff18ffcfc00 fileno=12 sock=10.0.0.175:5000 peer=10.0.0.101:35596>
  paste.cookies: (<SimpleCookie: _ga='GA1.2.1350896404.1580983419' _gid='GA1.2.1137990824.1595321762' auth_tkt='87ba178c634a28db8a2931872a14329c5f1925372%3Durn%253Aoasis%253Anames%253Atc%253ASAML%253A1.1%253Anameid-format%253AemailAddress%2C4%3DIrakli.Mchedlishvili%2540gatesfoundation.org!' ckan='18a7547cda72669eceb5c2f725dbd12c62e00e3e13a909095e414e32914bc8d9c461e708' clientchoice='DesktopClient' sfbShellAppLaunched='yes'>, 'sfbShellAppLaunched=yes; clientchoice=DesktopClient; _ga=GA1.2.1350896404.1580983419; _gid=GA1.2.1137990824.1595321762; auth_tkt="87ba178c634a28db8a2931872a14329c5f1925372%3Durn%253Aoasis%253Anames%253Atc%253ASAML%253A1.1%253Anameid-format%253AemailAddress%2C4%3DIrakli.Mchedlishvili%2540gatesfoundation.org!"; auth_tkt="87ba178c634a28db8a2931872a14329c5f1925372%3Durn%253Aoasis%253Anames%253Atc%253ASAML%253A1.1%253Anameid-format%253AemailAddress%2C4%3DIrakli.Mchedlishvili%2540gatesfoundation.org!"; ckan=18a7547cda72669eceb5c2f725dbd12c62e00e3e13a909095e414e32914bc8d9c461e708')
  paste.registry: <paste.registry.Registry object at 0x7ff18ff6c5d0>
  paste.throw_errors: True
  pylons.action_method: <bound method PackageController.read of <ckan.controllers.package.PackageController object at 0x7ff18ff6c910>>
  pylons.controller: <ckan.controllers.package.PackageController object at 0x7ff18ff6c910>
  pylons.environ_config: {'session': 'beaker.session', 'cache': 'beaker.cache'}
  pylons.pylons: <pylons.util.PylonsContext object at 0x7ff18ff6c3d0>
  pylons.routes_dict: {'action': u'read', 'controller': u'package', 'id': u'test-harvest'}
  repoze.who.api: <repoze.who.api.API object at 0x7ff18ff6c990>
  repoze.who.identity: <repoze.who identity (hidden, dict-like) at 140675476193952>
  repoze.who.logger: <logging.Logger object at 0x7ff1a50e79d0>
  repoze.who.plugins: {'saml2auth': <ckanext.saml2.s2repoze.plugins.sp.SAML2Plugin object at 0x7ff19358d490>, 'ckan.lib.authenticator:UsernamePasswordAuthenticator': <ckan.lib.authenticator.UsernamePasswordAuthenticator object at 0x7ff19358db10>, 'friendlyform': <FriendlyFormPlugin 140675546765648>, 'auth_tkt': <AuthTktCookiePlugin 140675546765584>}
  routes.route: <routes.route.Route object at 0x7ff1944573d0>
  routes.url: <routes.util.URLGenerator object at 0x7ff18ff6cc90>
  webob._parsed_query_vars: (GET([]), '')
  webob.adhoc_attrs: {'response': <Response at 0x7ff18ff6c810 200 OK>, 'language': 'en-us'}
  wsgi process: 'Multi process AND threads (?)'
  wsgi.file_wrapper: <class 'gunicorn.http.wsgi.FileWrapper'>
  wsgiorg.routing_args: (<routes.util.URLGenerator object at 0x7ff18ff6cc90>, {'action': u'read', 'controller': u'package', 'id': u'test-harvest'})
------------------------------------------------------------
shevron commented 4 years ago

I'm not sure I know enough about CKAN harvesters to analyze this - and I'm not sure this issue belongs in metastore-lib and not somewhere else (ckanext-versioning?).

Can you tell me a bit about test-harvest - is it a regular CKAN dataset? Do we know it has been created in the CKAN DB? It seems that if it was, it was not created in metastore. Then, something tries to fetch it but it doesn't exist, and this is why we get this error.

I'm going on a limb here, but maybe the code creating the dataset is not going through the normal CKAN flow and therefore metastore's create is not called.

zelima commented 4 years ago

@shevron I'm not sure why it is complaining about test-harvest - that is actually the name of the harvester. The harvested source is https://master.ckan.org/ and there are regular datasets ...

Do we know it has been created in the CKAN DB? It seems that if it was, it was not created in metastore

Not sure about any of them...

rufuspollock commented 4 years ago

@zelima it looks like you have datasets getting created in ckan core but not in metastore.