Booz Allen's lean manufacturing approach for holistically designing, developing and fielding AI solutions across the engineering lifecycle from data processing to model building, tuning, and training to secure operational deployment
Other
34
stars
8
forks
source link
TASK: Cleanup the legacy namespace for lineage event #449
In v1.7.0 we have released the OpenLineage Namesapce Conventions to better follow OpenLineage's guidelines. Moving forward, namespaces should be defined in the data-lineage.properties file. We are cleaning up the data.lineage.namespace properties in a project's data-lineage.properties file, which was supported as a fallback but will no longer be supported in release 1.10
DOD
Acceptance criteria required to complete the work
[x] Clean up the data.lineage.namespace support functions and tests
[x] Clean up the data.lineage.namespace properties in the data-lineage.properties file
[x] The pyspark, spark, and model training pipelines lineage events should still work as expected.
Test Strategy/Script
How will this item be verified?
Using create a new aissemble-based project using the latest archetype snapshot.
Copy the responded service id and run below command to verify the log without any errors. e.g.: kubectl logs job.batch/"model-training-logistic-tr-24cd1662-5b62-4e3c-946f-6b9081e30017"
Description
In v1.7.0 we have released the OpenLineage Namesapce Conventions to better follow OpenLineage's guidelines. Moving forward, namespaces should be defined in the data-lineage.properties file. We are cleaning up the
data.lineage.namespace
properties in a project'sdata-lineage.properties
file, which was supported as a fallback but will no longer be supported in release 1.10DOD
Acceptance criteria required to complete the work
data.lineage.namespace
support functions and testsdata.lineage.namespace
properties in thedata-lineage.properties
fileTest Strategy/Script
How will this item be verified?
Using create a new aissemble-based project using the latest archetype snapshot.
Set your Java version to 17 if it is not currently
Under -model/src/main/resources/pipelines add below pipeline models SparkPipeline.json, PythonPipeline.json, and ClassificationTraining.json
Fully generate the project by running
mvn clean install
and following manual actionsBuild the project without the cache and follow the last manual action.
Deploy the project and wait for all services ready
Manually trigger the
python-pipeline
pod and verify no errors in the logManually trigger the
spark-pipeline
pod and verify no errors in the logUse postman or any rest client to trigger the training pipeline and verify a successful training pipeline id responded
kubectl logs job.batch/"model-training-logistic-tr-24cd1662-5b62-4e3c-946f-6b9081e30017"
References/Additional Context
As needed