Closed man8pr closed 2 months ago
So far I've only looked at the
BigQuerySinkServiceImpl
where I found a significant amount of issues already, so I didn't see a point in continuing the review. Please find an overview below and inline in code.I think the first step should be a written architectural design proposal document that outlines the key classes and objects as well as their functionality, etc. This could later serve as documentation, too. Once that is done and approved the implementation can begin.
General
BigQuerySinkServiceImpl
remarks:
- don't use
org.json.JSONArray
, use existing EDC infrastructure, i.e.jakarta.JsonArray
instead.- pls no second JSON implementation (GSON). I realize Google has its own, but we're using Jackson, so pls stick to that if possible.
- general formatting issues: constructors go first, then come public methods, then protected, then private. Nested classes go last. Also, please don't cut off code at weird positions. Pls check out the EDC code style for IntelliJ
- individual
part
s can be materialized in memory. If the structure of a JSON document is relevant, then (real) streaming is not possible anyway.- interfaces of injectable services should go in an
spi
package.- the
Builder
is not needed, since it's a service-type object, and has a small number of collaborator objects (i.e. CTor params). also, theBuilder
breaks our parameterlessnewInstance()
paradigm.- do not use hardcoded numbers and magic strings, e.g. for timeouts, max repeats etc. Those should come from a configuration.
the pattern for reading config values is:
- the extension class reads values using
ServiceExtensionContext#getConfig()
- if there are many config values, they should be bundled in a
BigQueryClientConfiguration
object- either the raw values, or the config object should be passed into the service class
- there is virtually no error handling, especially error reporting to higher layers. The Sink only reports an error if an exception occurs, but transmission errors, or other SDK-originated errors are only logged to the monitor.
- why is the
BigQuerySinkService
needed? it seems, it and theBigQueryDataSink
do mostly the same thing. This is a hint at a flawed design. A better approach could be to have the sink delegate eachpart
to theBigQuerySinkService
and handle each resulting error individually- there is no unit test for the
BigQuerySinkServiceImpl
I try to consolidate the source- and sink-related class to simplify the design.
Re org.json.JSONArray
, its use is required, because Google APIs are using it. While GSON can be avoided, I suggest to consider in future a different degree of freedom when using working in a tech repo, compared to the shared repos: it could be still reasonable to choose a different solution for a specific provider, when a tech-specific solution is available.
What this PR changes/adds
With this PR, data exchange with BigQuery is possible, using the provider extension already available, and the present data plane extension.
Linked Issue(s)
Closes #160