Open TaylorHere opened 3 years ago
looks like my helm chart did't set schema worker changes to below will work
## Configuration for Redash ad-hoc workers
adhocWorker:
# adhocWorker.env -- Redash ad-hoc worker specific envrionment variables.
env:
QUEUES: "queries,celery,schemas,default,periodic"
WORKERS_COUNT: 4
impala ds or hive ds refresh all table schemas at once which will take long time to run in some big database, and the job timeout will kill the process then the refresh will fail, can we retrieve schema by layers? like HUE did, refresh schemas, when user click into some tables then refresh columns
also, not all 'tables' can do 'show column stats', there may have some views which should be ignored.
yeah, I'm having this same problem. My data source is a Databricks cluster and my hive metastore is on AWS Glue Data catalog. I am tailing the logs with sudo docker logs -f <container-id> --tail 50
and watching it start off, and each table takes maybe a second...and it hums along right up until:
...
[2021-09-03 14:58:16,215][PID:5878][INFO][pyhive.hive] USE 'default'
[2021-09-03 14:58:16,501][PID:5878][INFO][pyhive.hive] show columns in bronze.table_x
[2021-09-03 14:58:17,386][PID:5878][INFO][pyhive.hive] USE 'default'
[2021-09-03 14:58:17,695][PID:5878][INFO][pyhive.hive] show columns in bronze.table_y
[2021-09-03 14:58:18 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:5878)
[2021-09-03 14:58:18,277][PID:5878][INFO][metrics] method=GET path=/api/data_sources/8/schema endpoint=datasourceschemaresource status=500 content_type=? content_length=-1 duration=30238.72 query_count=6 query_duration=23.62
[2021-09-03 14:58:18 +0000] [5878] [INFO] Worker exiting (pid: 5878)
[2021-09-03 14:58:18 +0000] [5893] [INFO] Booting worker with pid: 5893
I reduced the number of tables I had in the metastore and it worked fine...so I think everything is grand other than the worker timing out....therefore I need to increase the time the worker needs to complete the schema refresh. Ideas?
looks like my helm chart did't set schema worker changes to below will work
## Configuration for Redash ad-hoc workers adhocWorker: # adhocWorker.env -- Redash ad-hoc worker specific envrionment variables. env: QUEUES: "queries,celery,schemas,default,periodic" WORKERS_COUNT: 4
This helped me so muuuch to solve query refresh stucked. There is a bad default value in the redash helm chart !
face the same issue, test connection is ok, but failed do Schema refresh while create New Query
server run with :
python3 manage.py runserver --debugger --reload -h 0.0.0.0 -p 5001
worker run with :
export QUEUES=queries,celery,schemas,default,periodic
watchmedo auto-restart --directory=./redash/ --pattern=*.py --recursive -- ./manage.py rq worker $QUEUES
worker error log bellows:
[WARNING][rq.worker] Moving job to FailedJobRegistry (Work-horse terminated unexpectedly; waitpid returned 11 (signal 11); )
[2024-11-03 23:22:31,430][PID:80585][DEBUG][rq.queue] Starting BLPOP operation for queues rq:queue:queries, rq:queue:celery, rq:queue:schemas, rq:queue:default, rq:queue:periodic with timeout of 405
seem relative to Redis but few log information not easy to trace out, maybe need study the rq usage
Issue Summary
In Query UI , every-time the Schema Refresh will Failed, and redash-server will produce logs like below
In my case, datasource include: Hive, Impala, Presto will get this error
A summary of the issue and the browser/OS environment in which it occurs.
Steps to Reproduce
Any other info e.g. Why do you consider this to be a bug? What did you expect to happen instead?
Technical details: