Open anayeaye opened 7 months ago
I found one example of duplicated logic so far between the two repos (and will update this comment as I find more). The /dataset/publish
endpoint seems to check if a collection exists twice.
/dataset/publish
endpoint in veda-data-airflow
has a dataset parameter of type
Union[schemas.ZarrDataset, schemas.COGDataset] = Body(
..., discriminator="data_type"
),
and Dataset
class has a validation to check if the collection exists because "we allow collection id to "break the rules" if an already-existing collection matches"
After the discover workflow kicks off an ingest workflow by calling the ingestion endpoint defined the veda-backend
, veda-backend
's enqueue_ingestion
’s parameter item
uses schemas.AccessibleItem. This class has validators.collection_exists
Therefore, in veda-data-airflow
and veda-backend
, the check to validate that a collection exists is called twice.
dataset
which has a type has inherits from the Dataset
class which has a collection validator, and then the workflow kicks off an ingest workflow where similar logic is defined by the AccessibleItem
in veda-backend
cc @anayeaye @smohiudd
What
The workflows API duplicates validations that are executed by the ingest API. We need to make sure these have the same effect and/or are removed as not needed in workflows.
Moreover, the current workflows API schema base model does not include the
renders
orproviders
fields and will fail when run with those properties. Either these fields should be included in the workflows model or leave downstream schema validation to ingestion API.AC