Booz Allen's lean manufacturing approach for holistically designing, developing and fielding AI solutions across the engineering lifecycle from data processing to model building, tuning, and training to secure operational deployment
Other
29
stars
7
forks
source link
Feature: As a devops engineer, I want an aissemble-managed helm chart for the Hive metastore service that uses a newer version of Hive, so I have access to the latest security fixes. #127
In order to improve usability and maintainability, we will be migrating to a v2 chart for the hive metastore service, keeping a similar usage pattern to that seen in #103. This ticket will encompass #116 as well, to update the underlying Hive metastore version.
Definition of Done
[x] Update hive-metastore-service docker image to use Hive 4.0.0
[x] Validate that the current v2 hive-metastore-service helm chart functions as expected
[x] If not, make necessary updates to ensure functionality.
[x] Refactor chart to live under extensions-helm-spark-infrastructure
[x] Update generated values/Chart file in downstream projects using the v2 profile with sensible defaults
Test Strategy/Script
Generate a new project using the following command:
Execute mvn clean install -Dmaven.build.cache.skipCache=true repeatedly, resolving all presented manual actions until none remain.
Within test-project-deploy/pom.xml, replace aissemble-spark-infrastructure-deploy with aissemble-spark-infrastructure-deploy-v2
Delete the directory test-project-deploy/src/main/resources/apps/spark-infrastructure
Delete all references to hive-metastore-service from your Tiltfile
Within test-project-pipelines/test-project-data-access/src/main/resources/application.properties, set quarkus.datasource.jdbc.url to jdbc:hive2://spark-infrastructure-sts-service:10001/default;transportMode=http;httpPath=cliservice
Within test-project-pipelines/pyspark-persist/src/pyspark_persist/step/persist_data.py, define the implementation for execute_step_impl as follows:
To avoid an unrelated bug, open your Tiltfile, and remove the entry for pipeline-invocation-service.
Execute tilt up
Once all resources are ready, trigger the pyspark-persist pipeline
Use kubectl get pods | grep data-access to get the name of the data access pod.
Use kubectl exec -it <DATA_ACCESS_POD_NAME> -- bash to enter the data access pod
Execute curl -X POST localhost:8080/graphql -H "Content-Type: application/json" -d '{ "query": "{ CustomRecord(table: \"my_new_table\") { customField } }" }' and ensure that data including two records is returned, ie: {"data":{"CustomRecord":[{"customField":null},{"customField":null}]}}
Note on step 19: If you don't get any values back, in a fresh prompt, execute kubectl get svc | grep sts. It can take a minute or two to provision the service.
Description
In order to improve usability and maintainability, we will be migrating to a v2 chart for the hive metastore service, keeping a similar usage pattern to that seen in #103. This ticket will encompass #116 as well, to update the underlying Hive metastore version.
Definition of Done
Test Strategy/Script
test-project-pipeline-models/src/main/resources/pipelines/
test-project-pipeline-models/src/main/resources/records/
test-project-pipeline-models/src/main/resources/dictionaries/
mvn clean install -Dmaven.build.cache.skipCache=true
repeatedly, resolving all presented manual actions until none remain.test-project-deploy/pom.xml
, replaceaissemble-spark-infrastructure-deploy
withaissemble-spark-infrastructure-deploy-v2
test-project-deploy/src/main/resources/apps/spark-infrastructure
hive-metastore-service
from your Tiltfiletest-project-pipelines/test-project-data-access/src/main/resources/application.properties
, setquarkus.datasource.jdbc.url
tojdbc:hive2://spark-infrastructure-sts-service:10001/default;transportMode=http;httpPath=cliservice
test-project-pipelines/pyspark-persist/src/pyspark_persist/step/persist_data.py
, define the implementation forexecute_step_impl
as follows:test-project-pipelines/pyspark-persist/src/pyspark_persist/resources/apps/pyspark-persist-dev-values.yaml
with the following:mvn clean install -Dmaven.build.cache.skipCache=true
once.kubectl apply -f
to apply the following yaml:pipeline-invocation-service
.tilt up
pyspark-persist
pipelinekubectl get pods | grep data-access
to get the name of the data access pod.kubectl exec -it <DATA_ACCESS_POD_NAME> -- bash
to enter the data access podcurl -X POST localhost:8080/graphql -H "Content-Type: application/json" -d '{ "query": "{ CustomRecord(table: \"my_new_table\") { customField } }" }'
and ensure that data including two records is returned, ie:{"data":{"CustomRecord":[{"customField":null},{"customField":null}]}}
kubectl get svc | grep sts
. It can take a minute or two to provision the service.References/Additional Context