TOSIT-IO / tdp-collection-extras

Ansible roles to deploy the extra components of TDP
Apache License 2.0
5 stars 13 forks source link

401 Authentication required when using pyspark in juptyer #190

Closed PACordonnier closed 2 months ago

PACordonnier commented 2 months ago

I just installed a basic jupyter hub and lab after recent refactor.

I get the following error when using basic pyspark commands

image

https://edge-01.tdp:8999/sessions is livy spark3. Here is the sparkmagic conf file found in worker

{
    "authenticators": {
        "Basic_Access": "sparkmagic.auth.basic.Basic",
        "Kerberos": "sparkmagic.auth.kerberos.Kerberos",
        "None": "sparkmagic.auth.customauth.Authenticator"
    },
    "coerce_dataframe": true,
    "configurable_retry_policy_max_retries": 8,
    "custom_headers": {},
    "fatal_error_suggestion": "The code failed because of a fatal error:\n\t{}.\n\nSome things to try:\na) Make sure Spark has enough available resources for Jupyter to create a Spark context.\nb) Contact your Jupyter administrator to make sure the Spark magics library is configured correctly.\nc) Restart the kernel.",
    "heartbeat_refresh_seconds": 30,
    "heartbeat_retry_seconds": 10,
    "http_session_config": {
        "adapters": [
            {
                "adapter": "requests.adapters.HTTPAdapter",
                "prefix": "https://"
            }
        ]
    },
    "ignore_ssl_errors": false,
    "kernel_python_credentials": {
        "auth": "Kerberos",
        "password": "",
        "url": "https://edge-01.tdp:8999",
        "username": ""
    },
    "kernel_r_credentials": {
        "auth": "Kerberos",
        "password": "",
        "url": "https://edge-01.tdp:8999",
        "username": ""
    },
    "kernel_scala_credentials": {
        "auth": "Kerberos",
        "password": "",
        "url": "https://edge-01.tdp:8999",
        "username": ""
    },
    "livy_server_heartbeat_timeout_seconds": 0,
    "livy_session_startup_timeout_seconds": 240,
    "logging_config": {
        "formatters": {
            "magicsFormatter": {
                "datefmt": "",
                "format": "%(asctime)s\t%(levelname)s\t%(message)s"
            }
        },
        "handlers": {
            "magicsHandler": {
                "class": "hdijupyterutils.filehandler.MagicsFileHandler",
                "formatter": "magicsFormatter",
                "home_path": "~/.sparkmagic"
            }
        },
        "loggers": {
            "magicsLogger": {
                "handlers": [
                    "magicsHandler"
                ],
                "level": "DEBUG",
                "propagate": 0
            }
        },
        "version": 1
    },
    "max_results_sql": 2500,
    "pyspark_dataframe_encoding": "utf-8",
    "retry_policy": "configurable",
    "retry_seconds_to_sleep_list": [
        0.2,
        0.5,
        1,
        3,
        5
    ],
    "server_extension_default_kernel_name": "pysparkkernel",
    "session_configs": {
        "driverMemory": "1000M",
        "executorCores": 2
    },
    "session_configs_defaults": {
        "conf": {
            "spark.sql.catalog.spark_catalog.type": "hive"
        }
    },
    "use_auto_viz": true,
    "wait_for_idle_timeout_seconds": 120
}
PACordonnier commented 2 months ago

I forgot to kinit in the terminal :facepalm:

rpignolet commented 2 months ago

At DGFiP we set this tdp_vars

jupyterhub_additional_yarnspawner_prologue: kinit -ki ; yarn application -updateLifetime 604000 --appId $SKEIN_APPLICATION_ID

The first command is to automate the kinit and the second is to kill the JupyterLab after 7 days because of the HDFS delegation token expiration and the JupyterLab YARN spawner does not support renewal according to our investigations.