conda-forge / kedro-datasets-feedstock

A conda-smithy repository for kedro-datasets.
BSD 3-Clause "New" or "Revised" License
0 stars 2 forks source link

What to do with the optional dependencies? #14

Open astrojuanlu opened 9 months ago

astrojuanlu commented 9 months ago

Comment:

Just learned that there's a feedstock for this package already, thank you!

kedro-datasets is mostly useless without the optional dependencies, but to my knowledge conda packages don't have such a thing. Any prior art on what the best practices are in cases like this?

astrojuanlu commented 5 months ago

@rxm7706 do you have any thoughts?

rxm7706 commented 5 months ago

@rxm7706 do you have any thoughts? kedro-datasets is mostly useless without the optional dependencies, but to my knowledge conda packages don't have such a thing. Any prior art on what the best practices are in cases like this?

@astrojuanlu - that is accurate, and I have managed so far by installing only the needed dependencies - based on the plugins I need.

Option 1 We can go independent feedstocks - e.g. kedro-datasets-plotly # https://github.com/kedro-org/kedro-plugins/blob/main/kedro-datasets/pyproject.toml#L25C1-L25C7

But for something like kedro-datasets - which has many variations and will continue to grow ; we will end up with a lot of disconnected feedstocks, and a lot of sequential feedstock updates to maintain.

OpenLineage and OpenTelemetry are examples for this pattern. https://github.com/conda-forge/openlineage-airflow-feedstock/blob/main/recipe/meta.yaml https://github.com/conda-forge/opentelemetry-instrumentation-grpc-feedstock/blob/main/recipe/meta.yaml

Option 2 I would almost rather go the way of Airflow - one recipe many outputs. e.g. https://github.com/conda-forge/airflow-feedstock/blob/main/recipe/meta.yaml#L200

I would recommend option 2, feel free to raise a PR and we can leverage this single feedstock. LMK what you prefer and how I can help.

astrojuanlu commented 5 months ago

cc @merelcht @noklam FYI