Open seanmullane opened 1 month ago
Thanks for opening your first issue in the Marquez project! Please be sure to follow the issue template!
Looks like currently in the DatasetFacetsDao.java
default void insertDatasetFacetsFor(
@NonNull UUID datasetUuid,
@NonNull UUID datasetVersionUuid,
@Nullable UUID runUuid,
@NonNull Instant lineageEventTime,
@Nullable String lineageEventType,
@NonNull LineageEvent.DatasetFacets datasetFacets) {
allows runid and lineageEventType to be null. Simplest solution would be to do the same for
insertInputDatasetFacetsFor insertOutputDatasetFacetsFor
Emitting a JobEvent with input and/or output datasets causes a HTTP500 error in the API, which results from a nullPointerException in Marquez.
Fixing this is important to allow static lineage graphs to be able to be generated without being associated with active runs. This is useful in cases where an integration is not yet available to consume pipeline runs for a given system or where a pipeline is not yet fleshed out but we want to enter the job in Marquez to see how it would relate to other jobs.
The attached code includes a purely json version generated the OpenLineage client which can prompt the bug in Marquez. I also included the python code the json derives from and the Marquez error log.
Environment:
Marquez 0.49.0 running via docker-compose per the Marquez example with --seed openlineage-python 1.22.0 python 3.11.9
nullPointerException.txt reproduce_bug.zip
More detail on this from phix on Slack: