getredash / redash

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
http://redash.io/
BSD 2-Clause "Simplified" License
26.31k stars 4.37k forks source link

Schema Refresh Failed #5410

Open TaylorHere opened 3 years ago

TaylorHere commented 3 years ago

Issue Summary

In Query UI , every-time the Schema Refresh will Failed, and redash-server will produce logs like below

│ [2021-02-26 03:54:15,714][PID:7152][ERROR][redash.app] Exception on /api/jobs/4758dffa-e5e4-495d-bc34-52c4c2a23c3c [GET]
│ Traceback (most recent call last):
│   File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1949, in full_dispatch_request
│     rv = self.dispatch_request()
│   File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1935, in dispatch_request
│     return self.view_functions[rule.endpoint](**req.view_args)
│   File "/usr/local/lib/python3.7/site-packages/flask_restful/__init__.py", line 458, in wrapper
│     resp = resource(*args, **kwargs)
│   File "/usr/local/lib/python3.7/site-packages/flask_login/utils.py", line 261, in decorated_view
│     return func(*args, **kwargs)
│   File "/usr/local/lib/python3.7/site-packages/flask/views.py", line 89, in view
│     return self.dispatch_request(*args, **kwargs)
│   File "/app/redash/handlers/base.py", line 33, in dispatch_request
│     return super(BaseResource, self).dispatch_request(*args, **kwargs)
│   File "/usr/local/lib/python3.7/site-packages/flask_restful/__init__.py", line 573, in dispatch_request
│     resp = meth(*args, **kwargs)
│   File "/app/redash/handlers/query_results.py", line 449, in get
│     job = Job.fetch(job_id)
│   File "/usr/local/lib/python3.7/site-packages/rq/job.py", line 299, in fetch
│     job.refresh()
│   File "/usr/local/lib/python3.7/site-packages/rq/job.py", line 518, in refresh
│     raise NoSuchJobError('No such job: {0}'.format(self.key))
│ rq.exceptions.NoSuchJobError: No such job: b'rq:job:4758dffa-e5e4-495d-bc34-52c4c2a23c3c'

In my case, datasource include: Hive, Impala, Presto will get this error

A summary of the issue and the browser/OS environment in which it occurs.

Steps to Reproduce

  1. Open Query UI
  2. Wait Schema Refresh
  3. Scema Refresh Failed

Any other info e.g. Why do you consider this to be a bug? What did you expect to happen instead?

Technical details:

TaylorHere commented 3 years ago

looks like my helm chart did't set schema worker changes to below will work

## Configuration for Redash ad-hoc workers
adhocWorker:
  # adhocWorker.env -- Redash ad-hoc worker specific envrionment variables.
  env:
    QUEUES: "queries,celery,schemas,default,periodic"
    WORKERS_COUNT: 4
TaylorHere commented 3 years ago

impala ds or hive ds refresh all table schemas at once which will take long time to run in some big database, and the job timeout will kill the process then the refresh will fail, can we retrieve schema by layers? like HUE did, refresh schemas, when user click into some tables then refresh columns

TaylorHere commented 3 years ago

also, not all 'tables' can do 'show column stats', there may have some views which should be ignored.

murphycrosby commented 3 years ago

yeah, I'm having this same problem. My data source is a Databricks cluster and my hive metastore is on AWS Glue Data catalog. I am tailing the logs with sudo docker logs -f <container-id> --tail 50 and watching it start off, and each table takes maybe a second...and it hums along right up until:

...
[2021-09-03 14:58:16,215][PID:5878][INFO][pyhive.hive] USE 'default'
[2021-09-03 14:58:16,501][PID:5878][INFO][pyhive.hive] show columns in bronze.table_x
[2021-09-03 14:58:17,386][PID:5878][INFO][pyhive.hive] USE 'default'
[2021-09-03 14:58:17,695][PID:5878][INFO][pyhive.hive] show columns in bronze.table_y
[2021-09-03 14:58:18 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:5878)
[2021-09-03 14:58:18,277][PID:5878][INFO][metrics] method=GET path=/api/data_sources/8/schema endpoint=datasourceschemaresource status=500 content_type=? content_length=-1 duration=30238.72 query_count=6 query_duration=23.62
[2021-09-03 14:58:18 +0000] [5878] [INFO] Worker exiting (pid: 5878)
[2021-09-03 14:58:18 +0000] [5893] [INFO] Booting worker with pid: 5893

I reduced the number of tables I had in the metastore and it worked fine...so I think everything is grand other than the worker timing out....therefore I need to increase the time the worker needs to complete the schema refresh. Ideas?

armandleopold commented 3 years ago

looks like my helm chart did't set schema worker changes to below will work

## Configuration for Redash ad-hoc workers
adhocWorker:
  # adhocWorker.env -- Redash ad-hoc worker specific envrionment variables.
  env:
    QUEUES: "queries,celery,schemas,default,periodic"
    WORKERS_COUNT: 4

This helped me so muuuch to solve query refresh stucked. There is a bad default value in the redash helm chart !

https://github.com/getredash/contrib-helm-chart/blob/f2a41b74480a4749895b9e72dbf242a658389a02/values.yaml#L350

icoco commented 4 days ago

face the same issue, test connection is ok, but failed do Schema refresh while create New Query

server run with :

 python3 manage.py runserver --debugger --reload -h 0.0.0.0 -p 5001

worker run with :

export QUEUES=queries,celery,schemas,default,periodic
watchmedo auto-restart --directory=./redash/ --pattern=*.py --recursive -- ./manage.py rq worker $QUEUES

worker error log bellows:

[WARNING][rq.worker] Moving job to FailedJobRegistry (Work-horse terminated unexpectedly; waitpid returned 11 (signal 11); )
[2024-11-03 23:22:31,430][PID:80585][DEBUG][rq.queue] Starting BLPOP operation for queues rq:queue:queries, rq:queue:celery, rq:queue:schemas, rq:queue:default, rq:queue:periodic with timeout of 405

seem relative to Redis but few log information not easy to trace out, maybe need study the rq usage