conda-forge / polars-feedstock

A conda-smithy repository for polars.
BSD 3-Clause "New" or "Revised" License
10 stars 19 forks source link

Make optional dependencies available via conda-forge #244

Open corneliusroemer opened 1 month ago

corneliusroemer commented 1 month ago

Comment:

I noticed that some optional polars dependencies aren't available through conda-forge. It would be nice if all optional dependencies could be installed via conda - so that one doesn't need pip.

Missing (to the best of my knowledge):

Also, it would be good to add the version constraints from https://github.com/pola-rs/polars/blob/master/py-polars/pyproject.toml#L42-L59 as optional run-constraints to the polars conda-forge recipe

transferred from https://github.com/pola-rs/polars/issues/7585

0xbe7a commented 1 month ago

xlsx2csv and deltalake are both available on conda-forge and are nearly up-to-date (https://github.com/conda-forge/xlsx2csv-feedstock/pull/1 @borchero 👀).

I am not sure how we can map pip's optional dependencies to conda until https://github.com/conda/ceps/pull/55 is accepted, without introducing run_constraints for the package in general.

borchero commented 1 month ago

I recently noticed that fastexcel is missing from conda-forge though, I'll take care in the next couple days.

borchero commented 1 month ago

I merged the xlsx2csv version update, thanks for pointing me to it @0xbe7a 😄

corneliusroemer commented 1 month ago

Added fastexcel to the todo-list :)

By the way deltalake was added around 11 months ago - my original issue is from 16 months ago 😄

corneliusroemer commented 1 month ago

If I understand run constraints correctly, they can be optional - that's the whole point? That they are not requirements but constraints in case the optional requirement is fulfilled?

@0xbe7a maybe I misunderstand, but to me that's what the docs say pretty explicitly.

See: https://docs.conda.io/projects/conda-build/en/latest/resources/define-metadata.html#run-constrained

E.g.

Run_constrained

Packages that are optional at runtime but must obey the supplied additional constraint if they are installed.

Package names should follow the package match specifications.

requirements:
  run_constrained:
    - optional-subpackage =={{ version }}

and

If build and link dependencies need to impose constraints on the run environment but not necessarily pull in additional packages, then this can be done by altering the Run_constrained entries. In addition to weak/strong run_exports which add to the run requirements, weak_constrains and strong_constrains add to the run_constrained requirements. With these, e.g., minimum versions of compatible but not required packages (like optional plugins for the linked dependency, or certain system attributes) can be expressed:

There's no way to specify the optionalities - but at least one can ensure that if an optional dependency is installed, it must be of a compatible version.

Also the run constraints then serve as a self-documenting set of optional dependencies.

0xbe7a commented 1 month ago

However, the run_constraint is also enforced even if the user is not using the Polars features that depend on these packages. For example, we might have a large environment containing openpyxl 2.* because some other logic makes use of it and is not using the Polars "read excel" functionality. The run_constraint would still enforce openpyxl 3.*. Conda would fail to solve this case, while with pip, we could simply not enable the optional feature.

corneliusroemer commented 1 month ago

True, but no package truly always use all the functionality that give rise to a particular constraint. I guess it depends how commonly used the optional dependency is and how tightly you are constrained/constraining. It's always a tradeoff.

The best way to ensure solve issues would be to ensure optional dependencies are as little constrained as possible.

0xbe7a commented 1 month ago

The best way to ensure solve issues would be to ensure optional dependencies are as little constrained as possible.

I see your point, and I also think that this should be a tradeoff based on the popularity of the dependency. I am hesitant here, as there is nothing the end-user can do to ignore these potentially "bogus" constraints and create the environment regardless of any constraints. Feel free to open a PR for any potential constraints. I think it's much better to go through this on a case-by-case basis rather than in general.