feat: BigQuery data plane

So far I've only looked at the BigQuerySinkServiceImpl where I found a significant amount of issues already, so I didn't see a point in continuing the review. Please find an overview below and inline in code.

I think the first step should be a written architectural design proposal document that outlines the key classes and objects as well as their functionality, etc. This could later serve as documentation, too. Once that is done and approved the implementation can begin.

General BigQuerySinkServiceImpl remarks:

don't use org.json.JSONArray, use existing EDC infrastructure, i.e. jakarta.JsonArray instead.

pls no second JSON implementation (GSON). I realize Google has its own, but we're using Jackson, so pls stick to that if possible.

general formatting issues: constructors go first, then come public methods, then protected, then private. Nested classes go last. Also, please don't cut off code at weird positions. Pls check out the EDC code style for IntelliJ

individual parts can be materialized in memory. If the structure of a JSON document is relevant, then (real) streaming is not possible anyway.

interfaces of injectable services should go in an spi package.

the Builder is not needed, since it's a service-type object, and has a small number of collaborator objects (i.e. CTor params). also, the Builder breaks our parameterless newInstance() paradigm.

do not use hardcoded numbers and magic strings, e.g. for timeouts, max repeats etc. Those should come from a configuration.

the pattern for reading config values is:

the extension class reads values using ServiceExtensionContext#getConfig()

if there are many config values, they should be bundled in a BigQueryClientConfiguration object

either the raw values, or the config object should be passed into the service class

there is virtually no error handling, especially error reporting to higher layers. The Sink only reports an error if an exception occurs, but transmission errors, or other SDK-originated errors are only logged to the monitor.

why is the BigQuerySinkService needed? it seems, it and the BigQueryDataSink do mostly the same thing. This is a hint at a flawed design. A better approach could be to have the sink delegate each part to the BigQuerySinkService and handle each resulting error individually

there is no unit test for the BigQuerySinkServiceImpl

I try to consolidate the source- and sink-related class to simplify the design. Re org.json.JSONArray , its use is required, because Google APIs are using it. While GSON can be avoided, I suggest to consider in future a different degree of freedom when using working in a tech repo, compared to the shared repos: it could be still reasonable to choose a different solution for a specific provider, when a tech-specific solution is available.

eclipse-edc / Technology-Gcp

feat: BigQuery data plane #178

What this PR changes/adds

Linked Issue(s)

General `BigQuerySinkServiceImpl` remarks:

eclipse-edc / Technology-Gcp

feat: BigQuery data plane #178

What this PR changes/adds

Linked Issue(s)

General BigQuerySinkServiceImpl remarks:

General `BigQuerySinkServiceImpl` remarks: