apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
62.43k stars 13.72k forks source link

chromedriver' executable needs to be in PATH #22099

Closed rohitpawar2811 closed 1 year ago

rohitpawar2811 commented 1 year ago

Facing problem in report sending through mail

[2022-11-11 11:53:00,175: ERROR/ForkPoolWorker-8] A downstream exception occurred while generating a report: 352d7d4c-54aa-45ac-8fbe-010079f2036b
superset_worker          | Traceback (most recent call last):
superset_worker          |   File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/common/service.py", line 72, in start
superset_worker          |     self.process = subprocess.Popen(cmd, env=self.env,
superset_worker          |   File "/usr/local/lib/python3.8/subprocess.py", line 858, in __init__
superset_worker          |     self._execute_child(args, executable, preexec_fn, close_fds,
superset_worker          |   File "/usr/local/lib/python3.8/subprocess.py", line 1704, in _execute_child
superset_worker          |     raise child_exception_type(errno_num, err_msg, err_filename)
superset_worker          | FileNotFoundError: [Errno 2] No such file or directory: 'chromedriver'
superset_worker          | 
superset_worker          | During handling of the above exception, another exception occurred:
superset_worker          | 
superset_worker          | Traceback (most recent call last):
superset_worker          |   File "/app/superset/reports/commands/execute.py", line 217, in _get_screenshots
superset_worker          |     image = screenshot.get_screenshot(user=user)
superset_worker          |   File "/app/superset/utils/screenshots.py", line 76, in get_screenshot
superset_worker          |     self.screenshot = driver.get_screenshot(self.url, self.element, user)
superset_worker          |   File "/app/superset/utils/webdriver.py", line 111, in get_screenshot
superset_worker          |     driver = self.auth(user)
superset_worker          |   File "/app/superset/utils/webdriver.py", line 89, in auth
superset_worker          |     driver = self.create()
superset_worker          |   File "/app/superset/utils/webdriver.py", line 86, in create
superset_worker          |     return driver_class(**kwargs)
superset_worker          |   File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/chrome/webdriver.py", line 73, in __init__
superset_worker          |     self.service.start()
superset_worker          |   File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/common/service.py", line 81, in start
superset_worker          |     raise WebDriverException(
superset_worker          | selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home```

selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home

What i Did I just run an docker-compose file and added these configurations

FEATURE_FLAGS = { "DYNAMIC_PLUGINS": True, "ENABLE_TEMPLATE_PROCESSING": True, "VERSIONED_EXPORT": True, "ALERT_REPORTS": True }

SCREENSHOT_LOCATE_WAIT = 1000 SCREENSHOT_LOAD_WAIT = 1600

Email configuration

THUMBNAIL_SELENIUM_USER = 'admin@mydomain.com'

ENABLE_SCHEDULED_EMAIL_REPORTS = True SMTP_HOST = "smtp.gmail.com" SMTP_USER = "@gmail.com" SMTP_PASSWORD = "XXXX" SMTP_PORT = "465" SMTP_SSL = False SMTP_STARTTLS = False SMTP_MAIL_FROM = "xxxx@gmail.com"

WebDriver configuration

WEBDRIVER_TYPE = "chrome" WEBDRIVER_OPTION_ARGS = [ "--force-device-scale-factor=2.0", "--high-dpi-support=2.0", "--headless", "--disable-gpu", "--disable-dev-shm-usage", "--no-sandbox", "--disable-setuid-sandbox", "--disable-extensions", ]```

Should i have to download the chrome-Driver in container , i think not so how can i give path to executable driver of crome selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home```

sfirke commented 1 year ago

If you're using docker-compose, use 2.0.0-dev as the Superset image. Or another -dev image. It will come bundled with chromedriver etc. while a non-dev image doesn't.

And then just let it use Firefox for the report screenshots, don't add WEBDRIVER_TYPE or OPTION values to your config file.

rohitpawar2811 commented 1 year ago

Yes initially I was using same Firefox with default config but getting this

superset_worker          | Traceback (most recent call last):
superset_worker          |   File "/app/superset/reports/commands/execute.py", line 217, in _get_screenshots
superset_worker          |     image = screenshot.get_screenshot(user=user)
superset_worker          |   File "/app/superset/utils/screenshots.py", line 76, in get_screenshot
superset_worker          |     self.screenshot = driver.get_screenshot(self.url, self.element, user)
superset_worker          |   File "/app/superset/utils/webdriver.py", line 111, in get_screenshot
superset_worker          |     driver = self.auth(user)
superset_worker          |   File "/app/superset/utils/webdriver.py", line 89, in auth
superset_worker          |     driver = self.create()
superset_worker          |   File "/app/superset/utils/webdriver.py", line 86, in create
superset_worker          |     return driver_class(**kwargs)
superset_worker          |   File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/firefox/webdriver.py", line 170, in __init__
superset_worker          |     RemoteWebDriver.__init__(
superset_worker          |   File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in __init__
superset_worker          |     self.start_session(capabilities, browser_profile)
superset_worker          |   File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
superset_worker          |     response = self.execute(Command.NEW_SESSION, parameters)
superset_worker          |   File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
superset_worker          |     self.error_handler.check_response(response)
superset_worker          |   File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
superset_worker          |     raise exception_class(message, screen, stacktrace)
superset_worker          | selenium.common.exceptions.InvalidArgumentException: Message: Argument --marionette can't be set via capabilities
superset_worker          | 
superset_worker          | 
superset_worker          | The above exception was the direct cause of the following exception:
superset_worker          | 
superset_worker          | Traceback (most recent call last):
superset_worker          |   File "/app/superset/tasks/scheduler.py", line 85, in execute
superset_worker          |     AsyncExecuteReportScheduleCommand(
superset_worker          |   File "/app/superset/reports/commands/execute.py", line 681, in run
superset_worker          |     raise ex
superset_worker          |   File "/app/superset/reports/commands/execute.py", line 677, in run
superset_worker          |     ReportScheduleStateMachine(
superset_worker          |   File "/app/superset/reports/commands/execute.py", line 639, in run
superset_worker          |     state_cls(
superset_worker          |   File "/app/superset/reports/commands/execute.py", line 540, in next
superset_worker          |     raise first_ex
superset_worker          |   File "/app/superset/reports/commands/execute.py", line 518, in next
superset_worker          |     self.send()
superset_worker          |   File "/app/superset/reports/commands/execute.py", line 415, in send
superset_worker          |     notification_content = self._get_notification_content()
superset_worker          |   File "/app/superset/reports/commands/execute.py", line 335, in _get_notification_content
superset_worker          |     screenshot_data = self._get_screenshots()
superset_worker          |   File "/app/superset/reports/commands/execute.py", line 222, in _get_screenshots
superset_worker          |     raise ReportScheduleScreenshotFailedError(
superset_worker          | superset.reports.commands.exceptions.ReportScheduleScreenshotFailedError: Failed taking a screenshot Message: Argument --marionette can't be set via capabilities

superset.reports.commands.exceptions.ReportScheduleScreenshotFailedError: Failed taking a screenshot Message: Argument --marionette can't be set via capabilities

sfirke commented 1 year ago

What image are you using for Superset in your docker-compose file?

rohitpawar2811 commented 1 year ago

apache/superset:${TAG:-latest-dev} developer one image i am using which is master branch of apache/superset with no changes

And i checked firefox and gecodriver are present inside usr/local/bin

Additional args to be passed as arguments to the config object Note: these options are Chrome-specific. For FF, these should only include the "--headless" arg

WEBDRIVER_OPTION_ARGS = ["--headless", "--marionette"] //It will cause Argument --marionette can't be set via capabilities WEBDRIVER_OPTION_ARGS = ["--headless"] // Process unexpectedly closed with status 255

sfirke commented 1 year ago

Maybe switch that to apache/superset:2.0.0-dev ? Trying to run the latest commit version can fail for all sorts of reasons, most people run a stable release in production.

On Sun, Nov 13, 2022, 4:25 AM Rohit pawar @.***> wrote:

apache/superset:${TAG:-latest-dev} devloper one image i am using

— Reply to this email directly, view it on GitHub https://github.com/apache/superset/issues/22099#issuecomment-1312683925, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZYDEH45VVKAJ5TPOX4P7DWICXYJANCNFSM6AAAAAAR5RB4NY . You are receiving this because you commented.Message ID: @.***>

rohitpawar2811 commented 1 year ago

Thanks now this errror not coming Argument --marionette can't be set via capabilities

But now this cause me

[2022-11-14 09:39:00,065: INFO/ForkPoolWorker-7] Scheduling alert oms eta: 2022-11-14 09:39:00
[2022-11-14 09:39:00,068: INFO/MainProcess] Task reports.execute[891170b6-de47-48a9-bb78-5fd39eaa0617] received
[2022-11-14 09:39:00,068: INFO/ForkPoolWorker-7] Task reports.scheduler[967fc63d-0a3b-4a3a-ba74-37ccd9f93818] succeeded in 0.019032752003113274s: None
[2022-11-14 09:39:00,111: INFO/ForkPoolWorker-7] Init selenium driver
[2022-11-14 09:39:08,374: DEBUG/ForkPoolWorker-7] Sleeping for 3 seconds
[2022-11-14 09:39:11,378: DEBUG/ForkPoolWorker-7] Wait for the presence of grid-container
[2022-11-14 09:39:11,392: DEBUG/ForkPoolWorker-7] Wait for .loading to be done
[2022-11-14 09:39:11,406: DEBUG/ForkPoolWorker-7] Wait for chart to have content
[2022-11-14 09:39:11,538: DEBUG/ForkPoolWorker-7] Wait 5 seconds for chart animation
[2022-11-14 09:39:16,542: INFO/ForkPoolWorker-7] Taking a PNG screenshot of url http://superset:8088/superset/dashboard/10/?standalone=3&force=false
[2022-11-14 09:39:27,404: DEBUG/ForkPoolWorker-7] [stats_logger] (gauge) reports.email.send.error1
[2022-11-14 09:39:37,646: DEBUG/ForkPoolWorker-7] [stats_logger] (gauge) reports.email.send.error1
[2022-11-14 09:39:37,665: ERROR/ForkPoolWorker-7] A downstream exception occurred while generating a report: 891170b6-de47-48a9-bb78-5fd39eaa0617
Traceback (most recent call last):
  File "/app/superset/tasks/scheduler.py", line 79, in execute
    AsyncExecuteReportScheduleCommand(
  File "/app/superset/reports/commands/execute.py", line 659, in run
    raise ex
  File "/app/superset/reports/commands/execute.py", line 655, in run
    ReportScheduleStateMachine(
  File "/app/superset/reports/commands/execute.py", line 624, in run
    state_cls(
  File "/app/superset/reports/commands/execute.py", line 525, in next
    raise first_ex
  File "/app/superset/reports/commands/execute.py", line 503, in next
    self.send()
  File "/app/superset/reports/commands/execute.py", line 409, in send
    self._send(notification_content, self._report_schedule.recipients)
  File "/app/superset/reports/commands/execute.py", line 400, in _send
    raise ReportScheduleNotificationError(";".join(notification_errors))
superset.reports.commands.exceptions.ReportScheduleNotificationError: Connection unexpectedly closed
[2022-11-14 09:39:37,667: INFO/ForkPoolWorker-7] Task reports.execute[891170b6-de47-48a9-bb78-5fd39eaa0617] succeeded in 37.598017300999345s: None
[2022-11-14 09:40:00,058: INFO/MainProcess] Task reports.scheduler[b7d419b9-107b-49bb-aa2b-9b0938763e4c] received
[2022-11-14 09:40:00,072: INFO/ForkPoolWorker-7] Scheduling alert oms eta: 2022-11-14 09:40:00
[2022-11-14 09:40:00,075: INFO/MainProcess] Task reports.execute[ea2634f9-bed6-4384-93ed-bb8b12fdbe0b] received
[2022-11-14 09:40:00,075: INFO/ForkPoolWorker-7] Task reports.scheduler[b7d419b9-107b-49bb-aa2b-9b0938763e4c] succeeded in 0.016144393001013668s: None

BUT When i make it True

ALERT_REPORTS_NOTIFICATION_DRY_RUN = True But I did'nt get the email thats my problem

[2022-11-14 10:22:00,063: INFO/ForkPoolWorker-7] Scheduling alert oms eta: 2022-11-14 10:22:00
[2022-11-14 10:22:00,066: INFO/MainProcess] Task reports.execute[7ac43222-448d-4f7f-9985-a397390b3a48] received
[2022-11-14 10:22:00,067: INFO/ForkPoolWorker-7] Task reports.scheduler[5e7fd309-383f-4b6e-b374-b191c5de3f9c] succeeded in 0.017026528999849688s: None
[2022-11-14 10:22:00,090: INFO/ForkPoolWorker-7] Init selenium driver
[2022-11-14 10:22:08,682: DEBUG/ForkPoolWorker-7] Sleeping for 3 seconds
[2022-11-14 10:22:11,686: DEBUG/ForkPoolWorker-7] Wait for the presence of grid-container
[2022-11-14 10:22:11,697: DEBUG/ForkPoolWorker-7] Wait for .loading to be done
[2022-11-14 10:22:11,708: DEBUG/ForkPoolWorker-7] Wait for chart to have content
[2022-11-14 10:22:11,842: DEBUG/ForkPoolWorker-7] Wait 5 seconds for chart animation
[2022-11-14 10:22:16,847: INFO/ForkPoolWorker-7] Taking a PNG screenshot of url http://superset:8088/superset/dashboard/10/?standalone=3&force=false
[2022-11-14 10:22:17,485: INFO/ForkPoolWorker-7] Would send notification for alert oms, to {"target": "rohitpawar28112000@gmail.com;"}
[2022-11-14 10:22:17,520: INFO/ForkPoolWorker-7] Task reports.execute[7ac43222-448d-4f7f-9985-a397390b3a48] succeeded in 17.45200149700031s: None
ftdtl commented 1 year ago

Maybe switch that to apache/superset:2.0.0-dev ? Trying to run the latest commit version can fail for all sorts of reasons, most people run a stable release in production. On Sun, Nov 13, 2022, 4:25 AM Rohit pawar @.> wrote: apache/superset:${TAG:-latest-dev} devloper one image i am using — Reply to this email directly, view it on GitHub <#22099 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZYDEH45VVKAJ5TPOX4P7DWICXYJANCNFSM6AAAAAAR5RB4NY . You are receiving this because you commented.Message ID: @.>

It worked after I changed the image to 2.0.0-dev.  but I would like to use superset in production,  I have tested severial versions with chrome or forefox, but it always 404 error, is there any other stable version I can use ?   thx

sfirke commented 1 year ago

In my docker-compose file I run 2.0.0-dev image for the worker and worker_beat containers and then 2.0.0 image for the superset application container. That way I'm running the stable release image for the application itself but the matching dev image for the workers so that they have the headless browser and drivers needed for reports.

On Mon, Nov 14, 2022, 8:47 PM ftdtl @.***> wrote:

Maybe switch that to apache/superset:2.0.0-dev ? Trying to run the latest commit version can fail for all sorts of reasons, most people run a stable release in production. … <#m-5622161347636703289> On Sun, Nov 13, 2022, 4:25 AM Rohit pawar @.> wrote: apache/superset:${TAG:-latest-dev} devloper one image i am using — Reply to this email directly, view it on GitHub <#22099 (comment) https://github.com/apache/superset/issues/22099#issuecomment-1312683925>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZYDEH45VVKAJ5TPOX4P7DWICXYJANCNFSM6AAAAAAR5RB4NY https://github.com/notifications/unsubscribe-auth/ABZYDEH45VVKAJ5TPOX4P7DWICXYJANCNFSM6AAAAAAR5RB4NY . You are receiving this because you commented.Message ID: @.>

It worked after I changed the image to 2.0.0-dev. but I would like to use superset in production, I have tested severial versions with chrome or forefox, but it always 404 error, is there any other stable version I can use ? thx

— Reply to this email directly, view it on GitHub https://github.com/apache/superset/issues/22099#issuecomment-1314644425, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZYDEDLTGCD544CF4ZZVKTWILTS5ANCNFSM6AAAAAAR5RB4NY . You are receiving this because you commented.Message ID: @.***>

rohitpawar2811 commented 1 year ago

Working configuration you can put it directly in superset_config.py and remove already placed code only this much

    BROKER_URL = f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_CELERY_DB}"
    CELERY_IMPORTS = ("superset.sql_lab",)
    CELERY_RESULT_BACKEND = f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_RESULTS_DB}"
    CELERYD_LOG_LEVEL = "DEBUG"
    # CELERYD_PREFETCH_MULTIPLIER = 1
    # CELERY_ACKS_LATE = False
    CELERYD_PREFETCH_MULTIPLIER = 10
    CELERY_ACKS_LATE = True
    CELERY_ANNOTATIONS = {
        'sql_lab.get_sql_results': {
            'rate_limit': '100/s',
        },
        'email_reports.send': {
            'rate_limit': '1/s',
            'time_limit': 600,
            'soft_time_limit': 600,
            'ignore_result': True,
        },
    }
    CELERYBEAT_SCHEDULE = {
        "email_reports.schedule_hourly": {
            "task": "email_reports.schedule_hourly",
            "schedule": crontab(minute=1, hour="*"),
        },
        "reports.scheduler": {
            "task": "reports.scheduler",
            "schedule": crontab(minute="*", hour="*"),
        },
        "reports.prune_log": {
            "task": "reports.prune_log",
            "schedule": crontab(minute=10, hour=0),
        },
    }

CELERY_CONFIG = CeleryConfig
#---------------------------------------MYEDITS-------------------------------------

FEATURE_FLAGS = {
    "DYNAMIC_PLUGINS": True,
    "ENABLE_TEMPLATE_PROCESSING": True,
    "VERSIONED_EXPORT": True,
    "ALERT_REPORTS": True
}
ALERT_REPORTS_NOTIFICATION_DRY_RUN = False
SCREENSHOT_LOCATE_WAIT = 1000
SCREENSHOT_LOAD_WAIT = 1600
ENABLE_ALERTS = True
# # Slack configuration
# SLACK_API_TOKEN = "xoxb-"

# Email configuration
#THUMBNAIL_SELENIUM_USER = 'admin@mydomain.com'
EMAIL_NOTIFICATIONS = True
ENABLE_SCHEDULED_EMAIL_REPORTS = True
SMTP_HOST = "smtp.gmail.com"
SMTP_USER = "@gmail.com"
SMTP_PASSWORD = "XXX"
SMTP_STARTTLS = False
SMTP_SSL = True
SMTP_PORT = 465
# SMTP_PORT = 587
SMTP_MAIL_FROM = 

WEBDRIVER_BASEURL = "http://superset:8088/"
WEBDRIVER_BASEURL_USER_FRIENDLY = "http://localhost:8088"
rusackas commented 1 year ago

Not sure if this relates to #20843

havannavar commented 1 year ago

@rusackas took latest, added all the changes as mentioned in the document

https://superset.apache.org/docs/installation/alerts-reports/#detailed-config

Still the email trigger is not working. any further changes need to do, which is not mentioned in the doc,?

eschutho commented 1 year ago

@havannavar is this the error that you're seeing? superset.reports.commands.exceptions.ReportScheduleNotificationError If so, it is fixed in 2.1.

Choiyoungmi0122 commented 1 year ago
image image

(bigdata) ✘ youngmi 🐸  ~/anaconda3/envs/bigdata  /Users/choiyoungmi/anaconda3/envs/bigdata/bin/python /Users/choiyoungmi/anaconda3/envs/bigdata/teamproject/0518.py Traceback (most recent call last): File "/Users/choiyoungmi/anaconda3/envs/bigdata/lib/python3.10/site-packages/selenium/webdriver/common/service.py", line 72, in start self.process = subprocess.Popen(cmd, env=self.env, File "/Users/choiyoungmi/anaconda3/envs/bigdata/lib/python3.10/subprocess.py", line 971, in init self._execute_child(args, executable, preexec_fn, close_fds, File "/Users/choiyoungmi/anaconda3/envs/bigdata/lib/python3.10/subprocess.py", line 1847, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) PermissionError: [Errno 13] Permission denied: '/Users/choiyoungmi/anaconda3/envs/bigdata/teamproject/chromedriver_mac_arm64'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/Users/choiyoungmi/anaconda3/envs/bigdata/teamproject/0518.py", line 2, in driver = webdriver.Chrome(executable_path = r'/Users/choiyoungmi/anaconda3/envs/bigdata/teamproject/chromedriver_mac_arm64') File "/Users/choiyoungmi/anaconda3/envs/bigdata/lib/python3.10/site-packages/selenium/webdriver/chrome/webdriver.py", line 73, in init self.service.start() File "/Users/choiyoungmi/anaconda3/envs/bigdata/lib/python3.10/site-packages/selenium/webdriver/common/service.py", line 86, in start raise WebDriverException( selenium.common.exceptions.WebDriverException: Message: 'chromedriver_mac_arm64' executable may have wrong permissions. Please see https://sites.google.com/a/chromium.org/chromedriver/home

This code run to occur this problem . Please help me

Choiyoungmi0122 commented 1 year ago
image image

My computer is M1 Air

Choiyoungmi0122 commented 1 year ago

I installed chromedriver_mac_arm62.zip.

mujohiddin commented 5 months ago

Working configuration you can put it directly in superset_config.py and remove already placed code only this much

    BROKER_URL = f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_CELERY_DB}"
    CELERY_IMPORTS = ("superset.sql_lab",)
    CELERY_RESULT_BACKEND = f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_RESULTS_DB}"
    CELERYD_LOG_LEVEL = "DEBUG"
    # CELERYD_PREFETCH_MULTIPLIER = 1
    # CELERY_ACKS_LATE = False
    CELERYD_PREFETCH_MULTIPLIER = 10
    CELERY_ACKS_LATE = True
    CELERY_ANNOTATIONS = {
        'sql_lab.get_sql_results': {
            'rate_limit': '100/s',
        },
        'email_reports.send': {
            'rate_limit': '1/s',
            'time_limit': 600,
            'soft_time_limit': 600,
            'ignore_result': True,
        },
    }
    CELERYBEAT_SCHEDULE = {
        "email_reports.schedule_hourly": {
            "task": "email_reports.schedule_hourly",
            "schedule": crontab(minute=1, hour="*"),
        },
        "reports.scheduler": {
            "task": "reports.scheduler",
            "schedule": crontab(minute="*", hour="*"),
        },
        "reports.prune_log": {
            "task": "reports.prune_log",
            "schedule": crontab(minute=10, hour=0),
        },
    }

CELERY_CONFIG = CeleryConfig
#---------------------------------------MYEDITS-------------------------------------

FEATURE_FLAGS = {
    "DYNAMIC_PLUGINS": True,
    "ENABLE_TEMPLATE_PROCESSING": True,
    "VERSIONED_EXPORT": True,
    "ALERT_REPORTS": True
}
ALERT_REPORTS_NOTIFICATION_DRY_RUN = False
SCREENSHOT_LOCATE_WAIT = 1000
SCREENSHOT_LOAD_WAIT = 1600
ENABLE_ALERTS = True
# # Slack configuration
# SLACK_API_TOKEN = "xoxb-"

# Email configuration
#THUMBNAIL_SELENIUM_USER = 'admin@mydomain.com'
EMAIL_NOTIFICATIONS = True
ENABLE_SCHEDULED_EMAIL_REPORTS = True
SMTP_HOST = "smtp.gmail.com"
SMTP_USER = "@gmail.com"
SMTP_PASSWORD = "XXX"
SMTP_STARTTLS = False
SMTP_SSL = True
SMTP_PORT = 465
# SMTP_PORT = 587
SMTP_MAIL_FROM = "rohitpawar28112000@gmail.com"

WEBDRIVER_BASEURL = "http://superset:8088/"
WEBDRIVER_BASEURL_USER_FRIENDLY = "http://localhost:8088"

this working

Working configuration you can put it directly in superset_config.py and remove already placed code only this much

    BROKER_URL = f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_CELERY_DB}"
    CELERY_IMPORTS = ("superset.sql_lab",)
    CELERY_RESULT_BACKEND = f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_RESULTS_DB}"
    CELERYD_LOG_LEVEL = "DEBUG"
    # CELERYD_PREFETCH_MULTIPLIER = 1
    # CELERY_ACKS_LATE = False
    CELERYD_PREFETCH_MULTIPLIER = 10
    CELERY_ACKS_LATE = True
    CELERY_ANNOTATIONS = {
        'sql_lab.get_sql_results': {
            'rate_limit': '100/s',
        },
        'email_reports.send': {
            'rate_limit': '1/s',
            'time_limit': 600,
            'soft_time_limit': 600,
            'ignore_result': True,
        },
    }
    CELERYBEAT_SCHEDULE = {
        "email_reports.schedule_hourly": {
            "task": "email_reports.schedule_hourly",
            "schedule": crontab(minute=1, hour="*"),
        },
        "reports.scheduler": {
            "task": "reports.scheduler",
            "schedule": crontab(minute="*", hour="*"),
        },
        "reports.prune_log": {
            "task": "reports.prune_log",
            "schedule": crontab(minute=10, hour=0),
        },
    }

CELERY_CONFIG = CeleryConfig
#---------------------------------------MYEDITS-------------------------------------

FEATURE_FLAGS = {
    "DYNAMIC_PLUGINS": True,
    "ENABLE_TEMPLATE_PROCESSING": True,
    "VERSIONED_EXPORT": True,
    "ALERT_REPORTS": True
}
ALERT_REPORTS_NOTIFICATION_DRY_RUN = False
SCREENSHOT_LOCATE_WAIT = 1000
SCREENSHOT_LOAD_WAIT = 1600
ENABLE_ALERTS = True
# # Slack configuration
# SLACK_API_TOKEN = "xoxb-"

# Email configuration
#THUMBNAIL_SELENIUM_USER = 'admin@mydomain.com'
EMAIL_NOTIFICATIONS = True
ENABLE_SCHEDULED_EMAIL_REPORTS = True
SMTP_HOST = "smtp.gmail.com"
SMTP_USER = "@gmail.com"
SMTP_PASSWORD = "XXX"
SMTP_STARTTLS = False
SMTP_SSL = True
SMTP_PORT = 465
# SMTP_PORT = 587
SMTP_MAIL_FROM = "rohitpawar28112000@gmail.com"

WEBDRIVER_BASEURL = "http://superset:8088/"
WEBDRIVER_BASEURL_USER_FRIENDLY = "http://localhost:8088"

will it work if i add this to config.py? ? @rohitpawar2811

rohitpawar2811 commented 5 months ago

@mujohiddin yes as initial configuration it is working at that time. you just have to place after Celery class configuration in superset_config.py

rohitpawar2811 commented 5 months ago
import logging
import os
from datetime import timedelta
from typing import Optional

from cachelib.file import FileSystemCache
from celery.schedules import crontab
from email.mime.multipart import MIMEMultipart
from typing import (
    Any,
    Callable,
    Dict,
    List,
    Literal,
    Optional,
    Set,
    Type,
    TYPE_CHECKING,
    Union,
)
from superset import security_manager

logger = logging.getLogger()

DATABASE_DIALECT = os.getenv("DATABASE_DIALECT")
DATABASE_USER = os.getenv("DATABASE_USER")
DATABASE_PASSWORD = os.getenv("DATABASE_PASSWORD")
DATABASE_HOST = os.getenv("DATABASE_HOST")
DATABASE_PORT = os.getenv("DATABASE_PORT")
DATABASE_DB = os.getenv("DATABASE_DB")

EXAMPLES_USER = os.getenv("EXAMPLES_USER")
EXAMPLES_PASSWORD = os.getenv("EXAMPLES_PASSWORD")
EXAMPLES_HOST = os.getenv("EXAMPLES_HOST")
EXAMPLES_PORT = os.getenv("EXAMPLES_PORT")
EXAMPLES_DB = os.getenv("EXAMPLES_DB")

# The SQLAlchemy connection string.
SQLALCHEMY_DATABASE_URI = (
    f"{DATABASE_DIALECT}://"
    f"{DATABASE_USER}:{DATABASE_PASSWORD}@"
    f"{DATABASE_HOST}:{DATABASE_PORT}/{DATABASE_DB}"
)

SQLALCHEMY_EXAMPLES_URI = (
    f"{DATABASE_DIALECT}://"
    f"{EXAMPLES_USER}:{EXAMPLES_PASSWORD}@"
    f"{EXAMPLES_HOST}:{EXAMPLES_PORT}/{EXAMPLES_DB}"
)

REDIS_HOST = os.getenv("REDIS_HOST", "redis")
REDIS_PORT = os.getenv("REDIS_PORT", "6379")
REDIS_CELERY_DB = os.getenv("REDIS_CELERY_DB", "0")
REDIS_RESULTS_DB = os.getenv("REDIS_RESULTS_DB", "1")

# Not recommended for production
# RESULTS_BACKEND = FileSystemCache("/app/superset_home/sqllab")

from flask_caching.backends.rediscache import RedisCache
RESULTS_BACKEND = RedisCache(
    host=REDIS_HOST, port=REDIS_PORT, key_prefix='superset_results')

CACHE_CONFIG = {
    "CACHE_TYPE": "RedisCache",
    "CACHE_DEFAULT_TIMEOUT": 300,
    "CACHE_KEY_PREFIX": "superset_",
    "CACHE_REDIS_HOST": REDIS_HOST,
    "CACHE_REDIS_PORT": REDIS_PORT,
    "CACHE_REDIS_DB": REDIS_RESULTS_DB,
}
DATA_CACHE_CONFIG = CACHE_CONFIG
#-------MyChanges---------------------------------------------------------------------------
class CeleryConfig:
    BROKER_URL = f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_CELERY_DB}"
    CELERY_IMPORTS = ("superset.sql_lab","superset.tasks","superset.tasks.thumbnails")
    CELERY_RESULT_BACKEND = f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_RESULTS_DB}"
    CELERYD_LOG_LEVEL = "DEBUG"
    # CELERYD_PREFETCH_MULTIPLIER = 1
    # CELERY_ACKS_LATE = False
    CELERYD_PREFETCH_MULTIPLIER = 10
    CELERY_ACKS_LATE = True
    CELERY_ANNOTATIONS = {
        'sql_lab.get_sql_results': {
            'rate_limit': '100/s',
        },
        'email_reports.send': {
            'rate_limit': '1/s',
            'time_limit': 1800,
            'soft_time_limit': 1800,
            'ignore_result': False,
        },
    }
    CELERYBEAT_SCHEDULE = {
        "email_reports.schedule_hourly": {
            "task": "email_reports.schedule_hourly",
            "schedule": crontab(minute=1, hour="*"),
        },
        "reports.scheduler": {
            "task": "reports.scheduler",
            "schedule": crontab(minute="*", hour="*"),
        },
        "reports.prune_log": {
            "task": "reports.prune_log",
            "schedule": crontab(minute=10, hour=0),
        },
        "alerts.schedule_check": {
            "task": "alerts.schedule_check",
            "schedule": crontab(minute="*", hour="*"),
        },
        #   "cache-warmup-hourly": {
        #           "task": "cache-warmup",
        #           "schedule": crontab(minute="*/30", hour="*"),
        #           "kwargs": {
        #               "strategy_name": "top_n_dashboards",
        #               "top_n": 10,
        #               "since": "7 days ago"
        #            }
        #        }
    }

CELERY_CONFIG = CeleryConfig
SQLLAB_CTAS_NO_LIMIT = True
# -------------------------------------------------------------------
#For Avoiding Overflow of sqlAlchemy_pool
SQLALCHEMY_POOL_SIZE = 100

SQLALCHEMY_MAX_OVERFLOW = 80

SQLALCHEMY_POOL_TIMEOUT = 180

#------------------------------------------------------------------------
EMAIL_NOTIFICATIONS = True

SMTP_HOST = ""
SMTP_STARTTLS = False
SMTP_SSL = False
SMTP_USER = ""
SMTP_PORT = 25
SMTP_PASSWORD = ""
SMTP_MAIL_FROM = ""

FEATURE_FLAGS = {
    "DYNAMIC_PLUGINS": True,
    "ENABLE_TEMPLATE_PROCESSING": True,
    "VERSIONED_EXPORT": True,
    "ALERT_REPORTS": True,
    "DASHBOARD_RBAC": False,
    "GLOBAL_ASYNC_QUERIES": False,
    "ALLOW_FULL_CSV_EXPORT" : False,
    "TAGGING_SYSTEM": True,
    "EMBEDDED_SUPERSET": True,
    "EMBEDDABLE_CHARTS": True
}

ALERT_REPORTS_NOTIFICATION_DRY_RUN = False
SCREENSHOT_LOCATE_WAIT = 1000
SCREENSHOT_LOAD_WAIT = 1600
ENABLE_ALERTS = True

ENABLE_SCHEDULED_EMAIL_REPORTS = True

WEBDRIVER_BASEURL = "http://superset:8088/"
# The base URL for the email report hyperlinks.
WEBDRIVER_BASEURL_USER_FRIENDLY = WEBDRIVER_BASEURL

Instead you can try this

fti-starawade commented 1 month ago

@rohitpawar2811 I am trying to use above superset_config but I am facing issues with docker-compose.yaml having superset worker celery beat and celery worker I am providing the docker-compose.yaml

version: '3.7'

networks:
  sdk_network:
    external:
      name: sdk_bridge_network
  system_network:
    external:
      name: system_bridge_network

services:
  # Metabase service
  metabase:
    build:
      context: ./metabase
    container_name: metabase
    hostname: metabase
    volumes:
      - data.vol:/dev/metabase
    expose:
      - 3000
    ports:
      - "3000:3000"
    networks:
      - system_network
      - sdk_network
    restart: always
    tty: true

  # Superset app container
  superset:
    image: sdk_superset-x64_vaapi:1.0.0
    build:
      context: ./superset
      args:
        - USERNAME=$USERNAME
        - FIRSTNAME=$FIRSTNAME
        - LASTNAME=$LASTNAME
        - EMAIL=$EMAIL
        - PASSWORD=$PASSWORD
    environment:
      USERNAME: ${USERNAME}
      PASSWORD: ${PASSWORD}
      SUPERSET_WEBDRIVER_TYPE: "chrome"
    volumes:
      - data.vol:/data
      - /etc/localtime:/etc/localtime:ro
      - /tmp/.X11-unix:/tmp/.X11-unix
    expose:
      - "8088"
    ports:  
      - "9097:8088"
    networks:
      sdk_network:
        aliases:
          - superset
    restart: always
    tty: true

  # Celery worker for async tasks
  superset-worker:
    build:
      context: ./superset
    container_name: superset_worker
    command: ["celery", "--app=superset.tasks.celery_app:app", "worker", "--pool=prefork", "-O", "fair", "-c", "4"]
    env_file:
      - .env
    environment:
      CELERYD_CONCURRENCY: 2
    restart: unless-stopped
    depends_on:
      - superset
    volumes:
      - data.vol:/data
    extra_hosts:
      - "host.docker.internal:host-gateway"
    healthcheck:
      test: ["CMD-SHELL", "celery -A superset.tasks.celery_app:app inspect ping -d celery@$$HOSTNAME"]

  # Celery beat for scheduling tasks
  superset-worker-beat:
    build:
      context: ./superset
    container_name: superset_worker_beat
    command: ["celery", "--app=superset.tasks.celery_app:app", "beat"]
    env_file:
      - .env
    restart: unless-stopped
    depends_on:
      - superset
    volumes:
      - data.vol:/data
    healthcheck:
      disable: true

  # Redis message broker
  redis:
    image: redis:6.0-alpine
    container_name: superset_cache
    networks:
      - sdk_network
    restart: always

  # PostgreSQL database
  postgres:
    image: postgres:13
    environment:
      POSTGRES_USER: superset
      POSTGRES_PASSWORD: superset
      POSTGRES_DB: superset
    volumes:
      - postgres_data:/var/lib/postgresql/data
    networks:
      - sdk_network
    restart: always

volumes:
  data.vol:
    external: true
  postgres_data:
    external: false
  celery_beat_data:
    external: false

And below is superset worker dockerfile.worker

FROM apache/superset:3.1.0

USER root

RUN apt-get update && \
    apt-get install -y wget zip libaio1

# Add Google Chrome repository and install Google Chrome
RUN wget -q -O - https://dl.google.com/linux/linux_signing_key.pub | apt-key add - && \
    echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" | tee /etc/apt/sources.list.d/google-chrome.list && \
    apt-get update && apt-get install -y --no-install-recommends google-chrome-stable && \
    rm -rf /var/lib/apt/lists/*

# Install ChromeDriver
RUN CHROMEDRIVER_VERSION=$(wget -q -O - https://chromedriver.storage.googleapis.com/LATEST_RELEASE) && \
    wget -q "https://chromedriver.storage.googleapis.com/${CHROMEDRIVER_VERSION}/chromedriver_linux64.zip" && \
    unzip chromedriver_linux64.zip -d /usr/local/bin && \
    chmod +x /usr/local/bin/chromedriver && \
    rm chromedriver_linux64.zip

# Install necessary Python packages
RUN pip install --no-cache gevent psycopg2 redis

# Ensure the directory for Celery Beat exists and has the right permissions
RUN mkdir -p /usr/local/var/celerybeat && \
    chown -R superset:superset /usr/local/var/celerybeat

USER superset

I followed the https://superset.apache.org/docs/configuration/alerts-reports/#browse-to-your-report-from-the-worker