cloudera / hue

Open source SQL Query Assistant service for Databases/Warehouses
https://cloudera.com
Apache License 2.0
1.17k stars 366 forks source link

Kubernetes Quick Start Fails if database pod not ready in time #1242

Closed bweissler-sf closed 3 years ago

bweissler-sf commented 4 years ago

Is the issue already present in https://github.com/cloudera/hue/issues or discussed in the forum https://discourse.gethue.com?

Describe the bug: Kubernetes quick start fails with hue pod crashing.

Steps to reproduce it?

>_ helm repo add gethue https://helm.gethue.com
"gethue" has been added to your repositories

>_ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "gethue" chart repository
Update Complete. ⎈ Happy Helming!⎈ 

>_ helm install gethue/hue
Error: must either provide a name or specify --generate-name

>_ helm install gethue/hue --generate-name
NAME: hue-1597190934
LAST DEPLOYED: Tue Aug 11 20:08:55 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Congratulations, you've launched the Hue SQL Editor for Data Warehouses!

To check the status of your installation run:

  helm list hue-1597190934

You should be able to execute queries by typing:

  kubectl port-forward svc/hue 8888:8888 --address 0.0.0.0 &

Then opening-up:

  http://localhost:8888

Or directly running below to get the URL:

  export WEB_HOST=$(kubectl get node -o jsonpath="{.items[0].metadata.name}")
  export WEB_PORT=$(kubectl get service hue -o jsonpath="{.spec.ports[*].nodePort}" --namespace default)

  echo http://$WEB_HOST:$WEB_PORT

Happy Querying!

>_ k get svc
NAME           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
hue            NodePort    10.97.167.128   <none>        8888:30962/TCP   46s
hue-postgres   NodePort    10.108.94.132   <none>        5432:30980/TCP   46s
kubernetes     ClusterIP   10.96.0.1       <none>        443/TCP          76d

>_ k get pod
NAME                 READY   STATUS             RESTARTS   AGE
hue-4gsm9            0/1     CrashLoopBackOff   4          2m57s
hue-postgres-ssqh8   1/1     Running            0          2m57s

>_ k logs hue-4gsm9 | grep Error
RuntimeError: Model class jobbrowser.models.HiveQuery doesn't declare an explicit app_label and isn't in an application in INSTALLED_APPS.
RuntimeError: Model class jobbrowser.models.HiveQuery doesn't declare an explicit app_label and isn't in an application in INSTALLED_APPS.
RuntimeError: Model class jobbrowser.models.HiveQuery doesn't declare an explicit app_label and isn't in an application in INSTALLED_APPS.
RuntimeError: Model class jobbrowser.models.HiveQuery doesn't declare an explicit app_label and isn't in an application in INSTALLED_APPS.
RuntimeError: Model class jobbrowser.models.HiveQuery doesn't declare an explicit app_label and isn't in an application in INSTALLED_APPS.
RuntimeError: Model class jobbrowser.models.HiveQuery doesn't declare an explicit app_label and isn't in an application in INSTALLED_APPS.
RuntimeError: Model class jobbrowser.models.HiveQuery doesn't declare an explicit app_label and isn't in an application in INSTALLED_APPS.
RuntimeError: Model class jobbrowser.models.HiveQuery doesn't declare an explicit app_label and isn't in an application in INSTALLED_APPS.
RuntimeError: Model class jobbrowser.models.HiveQuery doesn't declare an explicit app_label and isn't in an application in INSTALLED_APPS.

Hue version or source? (e.g. open source 4.5, CDH 5.16, CDP 1.0...). System info (e.g. OS, Browser...). Pulled on Aug 11, 2020 gethue/hue:latest.

>_ k describe rc hue
Name:         hue
Namespace:    default
Selector:     app=hue
Labels:       app=hue
Annotations:  <none>
Replicas:     1 current / 1 desired
Pods Status:  1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  app=hue
  Containers:
   hue:
    Image:        gethue/hue:latest
    Port:         8888/TCP
    Host Port:    0/TCP
    Environment:  <none>
    Mounts:
      /usr/share/hue/desktop/conf/z-hue.ini from config-volume (rw,path="hue-ini")
      /usr/share/hue/desktop/conf/zz-hue.ini from config-volume-extra (rw,path="hue-ini")
  Volumes:
   config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      hue-config
    Optional:  false
   config-volume-extra:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      hue-config-extra
    Optional:  false
Events:
  Type    Reason            Age   From                    Message
  ----    ------            ----  ----                    -------
  Normal  SuccessfulCreate  17m   replication-controller  Created pod: hue-4gsm9

Kubernetes via Docker Desktop

>_ k version -o json
{
  "clientVersion": {
    "major": "1",
    "minor": "16+",
    "gitVersion": "v1.16.6-beta.0",
    "gitCommit": "e7f962ba86f4ce7033828210ca3556393c377bcc",
    "gitTreeState": "clean",
    "buildDate": "2020-01-15T08:26:26Z",
    "goVersion": "go1.13.5",
    "compiler": "gc",
    "platform": "darwin/amd64"
  },
  "serverVersion": {
    "major": "1",
    "minor": "16+",
    "gitVersion": "v1.16.6-beta.0",
    "gitCommit": "e7f962ba86f4ce7033828210ca3556393c377bcc",
    "gitTreeState": "clean",
    "buildDate": "2020-01-15T08:18:29Z",
    "goVersion": "go1.13.5",
    "compiler": "gc",
    "platform": "linux/amd64"
  }
}
romainr commented 4 years ago

I wonder if the latest image was just bad. Still having an issue now?

How about with this image?

docker pull gethue/hue:20200811-135001

https://hub.docker.com/repository/docker/gethue/hue/tags?page=1

bweissler-sf commented 4 years ago

Thanks for the reply @romainr. Unfortunately I'm seeing the same behavior using that image (helm install gethue/hue --generate-name --set image.tag=20200811-135001).

I also tried gethue/hue:20200813-131731. Same errors again.

romainr commented 4 years ago

What is the full error trace that should you see in the bottom of kubectl logs hue-8d5664d77-p8zv2 hue?

romainr commented 4 years ago

If


  File "/usr/share/hue/desktop/core/src/desktop/urls.py", line 50, in <module>
    from desktop.configuration import api as desktop_configuration_api
  File "/usr/share/hue/desktop/core/src/desktop/configuration/api.py", line 32, in <module>
    from notebook.connectors.hiveserver2 import HiveConfiguration, ImpalaConfiguration
  File "/usr/share/hue/desktop/libs/notebook/src/notebook/connectors/hiveserver2.py", line 79, in <module>
    from jobbrowser.views import get_job
  File "/usr/share/hue/apps/jobbrowser/src/jobbrowser/views.py", line 64, in <module>
    from jobbrowser.api import get_api, ApplicationNotRunning, JobExpired
  File "/usr/share/hue/apps/jobbrowser/src/jobbrowser/api.py", line 38, in <module>
    from jobbrowser.yarn_models import Application, YarnV2Job, Job as YarnJob, KilledJob as KilledYarnJob, Container, SparkJob
  File "/usr/share/hue/apps/jobbrowser/src/jobbrowser/yarn_models.py", line 44, in <module>
    from jobbrowser.models import format_unixtime_ms
  File "/usr/share/hue/apps/jobbrowser/src/jobbrowser/models.py", line 85, in <module>
    class HiveQuery(models.Model):
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.29-py2.7.egg/django/db/models/base.py", line 118, in __new__
    "INSTALLED_APPS." % (module, name)
RuntimeError: Model class jobbrowser.models.HiveQuery doesn't declare an explicit app_label and isn't in an application in INSTALLED_APPS.

it is because jobbrowser is in the blacklist, would need to do a code fix or update the default hue.ini

app_blacklist=spark,zookeeper,hbase,sqoop,security,jobsub,...jobbrowser

romainr commented 4 years ago

Should fix it https://issues.cloudera.org/browse/HUE-9450

bweissler-sf commented 4 years ago

Ok @romainr , latest works. Thanks again. One snag is that the Hue pod starts but encounters an error while trying to connect to postgres. Clearly a timing issue.

The logs reveal this error.

File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/psycopg2/__init__.py", line 127, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django.db.utils.OperationalError: could not connect to server: Connection refused
    Is the server running on host "hue-postgres" (10.103.8.77) and accepting
    TCP/IP connections on port 5432?

This causes attempts to hit localhost:8888 to spew the following.

Traceback (most recent call last):
  File "/usr/share/hue/desktop/core/src/desktop/lib/wsgiserver.py", line 1228, in communicate
    req.respond()
  File "/usr/share/hue/desktop/core/src/desktop/lib/wsgiserver.py", line 589, in respond
    self._respond()
  File "/usr/share/hue/desktop/core/src/desktop/lib/wsgiserver.py", line 601, in _respond
    response = self.wsgi_app(self.environ, self.start_response)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.29-py2.7.egg/django/core/handlers/wsgi.py", line 157, in __call__
    response = self.get_response(request)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.29-py2.7.egg/django/core/handlers/base.py", line 124, in get_response
    response = self._middleware_chain(request)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.29-py2.7.egg/django/core/handlers/exception.py", line 43, in inner
    response = response_for_exception(request, exc)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.29-py2.7.egg/django/core/handlers/exception.py", line 93, in response_for_exception
    response = handle_uncaught_exception(request, get_resolver(get_urlconf()), sys.exc_info())
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.29-py2.7.egg/django/core/handlers/exception.py", line 143, in handle_uncaught_exception
    return callback(request, **param_dict)
  File "/usr/share/hue/desktop/core/src/desktop/views.py", line 447, in serve_500_error
    return render("500.mako", request, {'traceback': traceback.extract_tb(exc_info[2])})
  File "/usr/share/hue/desktop/core/src/desktop/lib/django_util.py", line 241, in render
    **kwargs
  File "/usr/share/hue/desktop/core/src/desktop/lib/django_util.py", line 154, in _render_to_response
    return django_mako.render_to_response(template, *args, **kwargs)
  File "/usr/share/hue/desktop/core/src/desktop/lib/django_mako.py", line 127, in render_to_response
    return HttpResponse(render_to_string(template_name, data_dictionary), **kwargs)
  File "/usr/share/hue/desktop/core/src/desktop/lib/django_mako.py", line 116, in render_to_string_normal
    result = template.render(**data_dict)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Mako-1.0.7-py2.7.egg/mako/template.py", line 462, in render
    return runtime._render(self, self.callable_, args, data)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Mako-1.0.7-py2.7.egg/mako/runtime.py", line 838, in _render
    **_kwargs_for_callable(callable_, data))
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Mako-1.0.7-py2.7.egg/mako/runtime.py", line 873, in _render_context
    _exec_template(inherit, lclcontext, args=args, kwargs=kwargs)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Mako-1.0.7-py2.7.egg/mako/runtime.py", line 899, in _exec_template
    callable_(context, *args, **kwargs)
  File "/tmp/tmp7RYyCo/desktop/500.mako.py", line 40, in render_body
    __M_writer(unicode( commonheader(_('500 - Server error'), "", user, request) ))
  File "/usr/share/hue/desktop/core/src/desktop/views.py", line 501, in commonheader
    current_app, other_apps, apps_list = _get_apps(user, section)
  File "/usr/share/hue/desktop/core/src/desktop/models.py", line 2142, in _get_apps
    if user.is_authenticated():
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.29-py2.7.egg/django/utils/functional.py", line 238, in inner
    self._setup()
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.29-py2.7.egg/django/utils/functional.py", line 386, in _setup
    self._wrapped = self._setupfunc()
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.29-py2.7.egg/django/contrib/auth/middleware.py", line 24, in <lambda>
    request.user = SimpleLazyObject(lambda: get_user(request))
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.29-py2.7.egg/django/contrib/auth/middleware.py", line 12, in get_user
    request._cached_user = auth.get_user(request)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.29-py2.7.egg/django/contrib/auth/__init__.py", line 211, in get_user
    user_id = _get_user_session_key(request)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.29-py2.7.egg/django/contrib/auth/__init__.py", line 61, in _get_user_session_key
    return get_user_model()._meta.pk.to_python(request.session[SESSION_KEY])
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.29-py2.7.egg/django/contrib/sessions/backends/base.py", line 57, in __getitem__
    return self._session[key]
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.29-py2.7.egg/django/contrib/sessions/backends/base.py", line 207, in _get_session
    self._session_cache = self.load()
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.29-py2.7.egg/django/contrib/sessions/backends/db.py", line 35, in load
    expire_date__gt=timezone.now()
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.29-py2.7.egg/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.29-py2.7.egg/django/db/models/query.py", line 374, in get
    num = len(clone)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.29-py2.7.egg/django/db/models/query.py", line 232, in __len__
    self._fetch_all()
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.29-py2.7.egg/django/db/models/query.py", line 1121, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.29-py2.7.egg/django/db/models/query.py", line 53, in __iter__
    results = compiler.execute_sql(chunked_fetch=self.chunked_fetch)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.29-py2.7.egg/django/db/models/sql/compiler.py", line 899, in execute_sql
    raise original_exception
ProgrammingError: relation "django_session" does not exist
LINE 1: ...ession_data", "django_session"."expire_date" FROM "django_se...
                                                             ^

Forcing the pod to restart by deleting it, results in success. Maybe some sleep or retry in the entrypoint script would be a good idea. Thanks again for your help. As this is a slightly different issue, feel free to close this one.

romainr commented 4 years ago

I think this is a valid point as indeed the Hue pod could have a initContainers that makes sure that the DB pod/connect-ivy is there.

However the auto restart of the pod should handle this automatically (as long as it runs the migrate command to create the tables. So if waiting a bit does not solve it we could add some logic indeed to see why the migrate does not run and auto fix the issue.

bweissler-sf commented 4 years ago

So, the problem I had is that the error didn't cause the pod to crash, rather it continue to run, but does not serve requests properly. I had to force the pod to restart by deleting it.

A pattern I've seen is to perform init functions a few times before failing them and forcing the pod to crash and restart.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.