Closed gopikaops closed 8 months ago
This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io
I would go through the files being changed as the root user and then at the end change the permissions back to the normal datahub user. @gopikaops
This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io
This issue was closed because it has been inactive for 30 days since being marked as stale.
Profiles are computed with PyDeequ, which relies on PySpark. Therefore, for computing profiles, we currently require Spark 3.0.3 with Hadoop 3.2 to be installed and the SPARK_HOME and SPARK_VERSION environment variables to be set. The Spark+Hadoop binary can be downloaded here.
Since we work in an air-gapped environment - we found that the code for downloading the binary is available in the DataHub git repo.
This code is in SparkBase.DockerFile - https://github.com/datahub-project/datahub/blob/3e79a1325cf8eca29a8bb818a50762366bfd5d22/metadata-integration/java/spark-lineage/spark-smoke-test/docker/SparkBase.Dockerfile#L4
This is called by build_image.sh - https://github.com/datahub-project/datahub/blob/3e79a1325cf8eca29a8bb818a50762366bfd5d22/metadata-integration/java/spark-lineage/spark-smoke-test/docker/build_images.sh#L22
which is called in setup_spark_smoke_test.sh - https://github.com/datahub-project/datahub/blob/3e79a1325cf8eca29a8bb818a50762366bfd5d22/metadata-integration/java/spark-lineage/spark-smoke-test/setup_spark_smoke_test.sh#L25
which is called in smoke.sh - https://github.com/datahub-project/datahub/blob/3e79a1325cf8eca29a8bb818a50762366bfd5d22/metadata-integration/java/spark-lineage/spark-smoke-test/smoke.sh#L53
which is defined in build.gradle - https://github.com/datahub-project/datahub/blob/3e79a1325cf8eca29a8bb818a50762366bfd5d22/metadata-integration/java/spark-lineage/build.gradle#L150
We added this code to make our custom actions image -
However it led to ingestion failing with following error -
We spoke to the team during the PoC event and they suggested they could make a custom docker image with required binaries for air-gapped environments.
However, we are no sure why ingestion failed when we tried making our own custom image.