Feature: Update data access tooling to better support distributed querying of big data

Description

Currently data access makes use of a GraphQL Quarkus app for accessing data outside of your spark pipeline. GraphQL is not optimized for performing queries against large datasets stored in data lakes. For better performance when accessing your data lake data, GraphQL should be replaced with a tool specifically designed for querying large data lakes (e.g Trino).

DOD

[x] Implement Trino as deploy profile option for data access
- [x] Create baseline Helm chart using the official Trino Helm chart as the parent
- [x] Should include defaults for configuring the chart to use the hive connector
- [x] Include helm chart unit tests
- [x] When enabled, generate a deploy resource dependent on the aiSSEMBLE Trino Helm Chart
- [x] Update antora docs to detail new data access option
- [x] Current data access becomes a drop down with GraphQL and Trino pages
[x] Remove GraphQL antora docs and deprecate the fermenter profiles

Test Strategy/Script

OTS Only:

Within the aiSSEMBLE repo, run the following and verify it builds successfully:

mvn clean install -pl :foundation-mda,:aissemble-trino-chart -Dmaven.build.cache.skipCache

Create a downstream project:

mvn archetype:generate -U -DarchetypeGroupId=com.boozallen.aissemble \
-DarchetypeArtifactId=foundation-archetype \
-DarchetypeVersion=1.11.0-SNAPSHOT \
-DgroupId=com.test \
-DartifactId=test-475 \
-DprojectGitUrl=test.url \
-DprojectName=test-475 \
&& cd test-475

Add the attached SparkPipeline.json to the test-475-pipeline-models/src/main/resources/pipelines/ directory
Add the attached PersonDictionary.json to the test-475-pipeline-models/src/main/resources/dictionaries/ directory
Add the attached Person.json to the test-475-pipeline-models/src/main/resources/records/ directory
Run mvn clean install until all the manual actions are complete

Add the following execution to the test-475-deploy/pom.xml:

<execution>
<id>trino</id>
<phase>generate-sources</phase>
<goals>
    <goal>generate-sources</goal>
</goals>
<configuration>
    <basePackage>com.test</basePackage>
    <profile>data-access-trino-deploy-v2</profile>
    <!-- The property variables below are passed to the Generation Context and utilized
            to customize the deployment artifacts. -->
    <propertyVariables>
        <appName>trino</appName>
    </propertyVariables>
</configuration>
</execution>

Add the following to the test-475-pipelines/spark-pipeline/src/main/java/com/test/TestSyncStep.java:


+import java.util.List;
+import java.util.stream.Stream;
+import simple.test.record.Person;
+import simple.test.record.PersonSchema;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;

...

@Override
protected void executeStepImpl() {

// TODO: Add your business logic here for this step!
logger.error("Implement executeStepImpl(..) or remove this pipeline step!");
logger.info("Saving Person to table People");
Person person = new Person();
person.setName("John Smith");
person.setAge(50);
PersonSchema personSchema = new PersonSchema();
List rows = Stream.of(person).map(PersonSchema::asRow).toList();
Dataset dataset = sparkSession.createDataFrame(rows, personSchema.getStructType());
saveDataset(dataset, "People");
logger.info("Completed saving to table People"); }
Run mvn clean install -Dmaven.build.cache.skipCache to get any remaining manual actions
OTS Only: The project will fail to build due to the new helm chart not being published yet
- Update the test-475-deploy/src/main/resources/apps/trino/Chart.yaml with the following:
```
dependencies:
- name: aissemble-trino-chart
version: 1.11.0-SNAPSHOT
```
- repository: oci://ghcr.io/boozallen
- repository: file://../../../../../../../aissemble/extensions/extensions-helm/aissemble-trino-chart
- Continue the build with mvn clean install -Dmaven.build.cache.skipCache -rf :test-475-deploy
Complete the manual actions and run tilt up
Once all the resources are ready on the tilt ui, start the spark-pipeline resource
Verify you see the following log ouput in the pipeline:
```
INFO TestSyncStep: Completed saving to table People
```
Connect to Trino using the cli: ./trino --server http://localhost:8084
- See Trino CLI documentation for details on installation
Run the following command to query the data:
```
select * from hive.default.people;
```

Verify you get the following output:


name    | age
------------+-----
John Smith |  50
(1 row)

Query 20241122_143943_00000_c3nss, FINISHED, 1 node Splits: 1 total, 1 done (100.00%) 2.65 [1 rows, 14B] [0 rows/s, 5B/s]


- `tilt down`
- Remove the following from `test-475-pipeline-models/src/main/resources/records/Person.json` on lines 5-7:

"dataAccess": {
    "enabled": "false"
},


- Build the project once with `mvn clean install -Dmaven.build.cache.skipCache` and complete the manual actions
- Build the project once with `mvn clean install` and verify you see the following warnings about data-access deprecation:

/your/path/test-475/test-475-pipelines/test-475-data-access/pom.xml: Data Access using GraphQL is deprecated, please see the latest documentation for details on using Trino for Data Access: https://boozallen.github.io/aissemble/aissemble/current/data-access-details.html

/your/path/test-475/test-475-docker/test-475-data-access-docker/pom.xml: The profile 'aissemble-data-access-docker' is deprecated, please replace all references to it.

/your/path/devRepos/test-475/test-475-deploy/pom.xml: The profile 'data-access-deploy-v2' is deprecated, please replace all references to it.


## References/Additional Context

boozallen / aissemble

Feature: Update data access tooling to better support distributed querying of big data #475

Description

DOD

Test Strategy/Script