apache / datafusion-python

Apache DataFusion Python Bindings
https://datafusion.apache.org/python
Apache License 2.0
320 stars 63 forks source link

Build conda nightlies jobs are failing on main for aarch64 #659

Closed raulcd closed 1 month ago

raulcd commented 2 months ago

Describe the bug The aarch 64 jobs for conda nightlies are failing with:

Conda detected a mismatch between the expected content and downloaded content

See:

To Reproduce

Expected behavior Jobs succeed and build for aarch64

Additional context Add any other context about the problem here.

Michael-J-Ward commented 1 month ago

Taking a look at the logs, a few things look off.

1) conda-forge/linux-64 seems like it's the wrong cache for the linux-aarch64 job. 2) warning libmamba Cache file was modified by another program makes me think concurrent jobs are modifying the same cache 3) rust-std-x86_64 also seems like the wrong download, and that's the one that causes the mismatched hash error.

Attempting to finalize metadata for datafusion
conda-forge/linux-64                                        Using cache
conda-forge/noarch                                          Using cache
Reloading output folder: 
/home/runner/work/datafusion-python/datafusion-python/packages
warning  libmamba Cache file "/home/runner/conda_pkgs_dir/cache/09cdf8bf.json" was modified by another program

...

Conda detected a mismatch between the expected content and downloaded content
for url 'https://conda.anaconda.org/conda-forge/noarch/rust-std-x86_64-unknown-linux-gnu-1.77.2-h2c6d0dc_0.conda'.
  download saved to: /home/runner/conda_pkgs_dir/rust-std-x86_64-unknown-linux-gnu-1.77.2-h2c6d0dc_0.conda
  expected sha256: a482597672076f47c83d0dd3f204eb437007b99ada4d630d56fa64b4b193c5db
  actual sha256: 73f7537db6bc0471135a85a261798abe77e7e83794f945a0355c4068973f31f6

The things I'd like to try, in order.

1) set the concurrency to hard lock of 1 conda job at a time. 2) upgrade miniconda action (v3 has automatic aarch detection) EDIT: looks like there's a PR for that and it failed https://github.com/apache/datafusion-python/pull/658 3) blow out the conda and start fresh

Michael-J-Ward commented 1 month ago

Investigating further. The actual sha256sum that the CI report matches both what I calculate when downloading the files and what conda-forge lists.

file: rust-std-aarch64-unknown-linux-gnu-1.77.2-hbe8e118_0.conda
sha256sum: 9d583f04bfdbccc82ac2f0653de571f8371df04633727e714b71efd7e4a0140a

file:  rust-std-x86_64-unknown-linux-gnu-1.77.2-h2c6d0dc_0.conda
sha256: 73f7537db6bc0471135a85a261798abe77e7e83794f945a0355c4068973f31f6

So, I tried cleaning out the cache, but that only caused the builds to break in a new way...

 Adding in variants from internal_defaults
Adding in variants from config.variant
Adding in variants from argument_variants
Error: bad character '-' in package/version: publish-docs.1a240507

Closing the PR for now.

Michael-J-Ward commented 1 month ago

Ah, publish-docs is the tag @andygrove used to trigger the docs generation & publication.

Apparently, conda doesn't like that for a package/version.