conda-forge / xopen-feedstock

A conda-smithy repository for xopen.
BSD 3-Clause "New" or "Revised" License
1 stars 7 forks source link

No osx-arm support? #28

Closed corneliusroemer closed 1 month ago

corneliusroemer commented 1 year ago

Comment:

I hadn't noticed that I had actually installed xopen 0.7.3 via bioconda in my arm environment - rather than a recent xopen, say 1.7.0 via conda-forge, until I got an error trying to use zstd.

Is there a fundamental reason why xopen doesn't support osx-arm?

corneliusroemer commented 1 year ago

Maybe this should actually be noarch?

corneliusroemer commented 1 year ago

@rhpvorderman here we go :)

crossref: https://github.com/pycompression/python-isal/issues/141

rhpvorderman commented 1 year ago

I added myself on the list to be always notified

rhpvorderman commented 1 year ago

Maybe this should actually be noarch?

Xopen has architecture and OS dependent dependencies. It will only require pbzip2 on platforms where it is available. The same goes for ISA-L and the python bindings for ISA-L. Unfortunately a noarch package can not cater to those needs. Multiple packages should be made for each OS/architecture combination.

rhpvorderman commented 1 year ago

@corneliusroemer can you verify if installing xopen through conda is possible now on your arm64 macOS system?

corneliusroemer commented 1 year ago

Hmm, no, I don't think it's available in the osx-arm64 channel, see available files/channels: https://anaconda.org/conda-forge/xopen/files

image
corneliusroemer commented 1 year ago

You need to somehow explicitly add osx-arm64 to the recipe? Not sure, I couldn't find documentation on how to add it.

Maybe https://github.com/orgs/conda-forge/teams/help-osx-arm64 can help?

rhpvorderman commented 1 year ago

We are in a bit of an unfortunate situation here. Preferably xopen should be noarch: python, but that does not allow architecture specific dependencies with conda's dependency model.

If you install via pip, you should be fine.

If you find a way to solve the conda issue, please make a PR.

marcelm commented 5 months ago

It seems the way to solve this would be to use virtual packages: https://conda-forge.org/docs/maintainer/knowledge_base/#noarch-packages-with-os-specific-dependencies

However, these currently only allow one to create noarch: python packages where the dependencies vary depending on the operating system but on the architecture. Previously, it would have been possible to use __archspec to query the architecture, but a change was made that makes this more difficult, see https://github.com/conda/conda/issues/13420 .

corneliusroemer commented 1 month ago

It seems xopen is still awaiting parent:

Brave Browser 2024-07-11 02 12 10

Could a manual migration like for aarch64 work - isal isn't strictly required is it? Then at least we could unblock downstream recipes that require xopen, like orjsonl, augur etc.

rhpvorderman commented 1 month ago

For x86-64 isal is strictly required. For compression performance, the gap between it and zlib-ng or normal zlib is simply to great to leave it out. Aarch64 should work for isa-l. In fact wheels for aarch64 are already published.

Macos-arm64 is not supported by upstream isa-l yet. That will come with the next release.

corneliusroemer commented 1 month ago

@rhpvorderman I don't think it matters for anything practical as none of the proposals remove isal support for x86-64, but I would gently challenge your definition of "strictly required" here. In my view, xopen is a convenience wrapper to read/write to/from compressed files for a variety of compression formats. Unless I'm missing something, isal isn't required for correctness. You can use zlib and xopen will not error nor will it produce incorrect results. It will just be slower.

In fact the having isa-l matters if both of the following is true:

corneliusroemer commented 1 month ago

I'll close this as arm64 support has been merged

rhpvorderman commented 1 month ago

Unless I'm missing something, isal isn't required for correctness. You can use zlib and xopen will not error nor will it produce incorrect results. It will just be slower.

Xopen is used mainly in bioinformatics. In cutadapt, using the old zlib compression, the workload even when using compression level 1 was 33% of the total compute workload. Using ISA-L dropped that to 6%. (That was back then, numbers might differ now). In general, as bioinformatics packages get more efficient, compression/decompression becomes a large part of the work load, to the point that the tool becomes a (de)compression tool with bioinformatic side effects. Another example: sequali, a tool that replaces FastQC and NanoPlot spends 33% of its time decompressing with ISA-L. So the ratio of decompression to actual work is 1:2. If ISA-L wouldn't be there, it would be 1:1. Quite a drastic reduction in performance.

So for generic use I agree with you, for bioinformatic usage I do not. Working with tens or hundreds of gigabytes of gzipped or BAM (which is also zlib compressed) data everyday means that ISA-L performance is absolutely essential. The whole reason I got so involved in compression is that it is such a large part of the bioinformatic compute workload. It is very significant.

From the readme (emphasis mine):

This Python module provides an xopen function that works like Python’s built-in open function but also transparently deals with compressed files. xopen selects the most efficient method for reading or writing a compressed file.

Luckily ISA-L will be available for macos-arm64 as soon as a new release drops. I will update the python bindings as fast as possible in that case.

EDIT: Oh, and I forgot to mention that python-isal and python-zlib-ng support offloading to different threads now, unlike builtin zlib. This is a very cool feature that decreases wall clock time substantially. It required some investment to get it working, but I am very happy with how it turned out.