boozallen / aissemble

Booz Allen's lean manufacturing approach for holistically designing, developing and fielding AI solutions across the engineering lifecycle from data processing to model building, tuning, and training to secure operational deployment
Other
34 stars 8 forks source link

Feature: Update data access tooling to better support distributed querying of big data #475

Open carter-cundiff opened 4 days ago

carter-cundiff commented 4 days ago

Description

Currently data access makes use of a GraphQL Quarkus app for accessing data outside of your spark pipeline. GraphQL is not optimized for performing queries against large datasets stored in data lakes. For better performance when accessing your data lake data, GraphQL should be replaced with a tool specifically designed for querying large data lakes (e.g Trino).

DOD

Test Strategy/Script

...

@Override
protected void executeStepImpl() {

Query 20241122_143943_00000_c3nss, FINISHED, 1 node Splits: 1 total, 1 done (100.00%) 2.65 [1 rows, 14B] [0 rows/s, 5B/s]


- `tilt down`
- Remove the following from `test-475-pipeline-models/src/main/resources/records/Person.json` on lines 5-7:
"dataAccess": {
    "enabled": "false"
},

- Build the project once with `mvn clean install -Dmaven.build.cache.skipCache` and complete the manual actions
- Build the project once with `mvn clean install` and verify you see the following warnings about data-access deprecation:

/your/path/test-475/test-475-pipelines/test-475-data-access/pom.xml: Data Access using GraphQL is deprecated, please see the latest documentation for details on using Trino for Data Access: https://boozallen.github.io/aissemble/aissemble/current/data-access-details.html

/your/path/test-475/test-475-docker/test-475-data-access-docker/pom.xml: The profile 'aissemble-data-access-docker' is deprecated, please replace all references to it.

/your/path/devRepos/test-475/test-475-deploy/pom.xml: The profile 'data-access-deploy-v2' is deprecated, please replace all references to it.


## References/Additional Context
carter-cundiff commented 3 days ago

DOD with @ewilkins-csi, @csun-cpointe

carter-cundiff commented 1 day ago

OTS with @nartieri @ewilkins-csi