NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
16.52k stars 13k forks source link

Build failure: gdal 3.8.4 #302137

Closed Chwiggy closed 1 month ago

Chwiggy commented 3 months ago

Steps To Reproduce

Steps to reproduce the behavior:

  1. build gdal 3.8.4

Build log

Build Log Gist

Additional context

Builds, but fails 40 or so tests from pytests

Notify maintainers

As Team Geospatial: @das-g @imincik @nh2 @nialov @sikmir @willcohen

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
- system: `"x86_64-linux"`
 - host os: `Linux 6.8.2, NixOS, 24.05 (Uakari), 24.05.20240329.d8fe5e6`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.18.2`
 - nixpkgs: `/nix/store/cb1gs888vfqxawvc65q1dk6jzbayh3wz-source`

Add a :+1: reaction to issues you find important.

imincik commented 3 months ago

Thanks for reporting.

I can't reproduce it. I just successfully built gdal with latest nixpkgs master and. gdal 3.8.5 is also building successfully.

imincik commented 3 months ago

Which nixpkgs version do you use in your build ?

Chwiggy commented 3 months ago

ah yes, sorry forgot to add, that i was using nixpkgs/nixos-unstable

imincik commented 3 months ago

Still can't reproduce.

I successfully built with latest nixos-unstable version

commit fd281bd6b7d3e32ddfa399853946f782553163b5 (HEAD -> nixos-unstable, origin/nixos-unstable)
Merge: 75a18a64ed1f 7a3ed2fe5351
Author: Mauricio Collares <mauricio@collares.org>
Date:   Wed Apr 3 18:51:29 2024 +0200

    Merge pull request #275577 from soispha/master

    manim-slides: Init at v5.1.3
Chwiggy commented 3 months ago

i cant reproduce it today either :eyes: so i'm closing this issue

Chwiggy commented 2 months ago

Seems like i ran into this issue once again with the update to gdal 3.8.5 :eyes:

imincik commented 2 months ago

Seems like i ran into this issue once again with the update to gdal 3.8.5 👀

I need some instructions how to reproduce it.

Chwiggy commented 2 months ago

I didn't have much time yesterday, but today, I tried to isolate the problem a bit more than last time.

With nix build github:nixos/nixpkgs/nixos-unstable#gdal OR with flake:

{
  description = "A very basic flake";

  inputs = {
    nixpkgs.url = "github:nixos/nixpkgs?ref=nixos-unstable";
  };

  outputs = {
    self,
    nixpkgs,
  }: {
    packages.x86_64-linux.gdal = nixpkgs.legacyPackages.x86_64-linux.gdal;

    packages.x86_64-linux.default = self.packages.x86_64-linux.gdal;
  };
}

and wiith flake.lock:

# flake.lock
{
  "nodes": {
    "nixpkgs": {
      "locked": {
        "lastModified": 1714253743,
        "narHash": "sha256-mdTQw2XlariysyScCv2tTE45QSU9v/ezLcHJ22f0Nxc=",
        "owner": "nixos",
        "repo": "nixpkgs",
        "rev": "58a1abdbae3217ca6b702f03d3b35125d88a2994",
        "type": "github"
      },
      "original": {
        "owner": "nixos",
        "ref": "nixos-unstable",
        "repo": "nixpkgs",
        "type": "github"
      }
    },
    "root": {
      "inputs": {
        "nixpkgs": "nixpkgs"
      }
    }
  },
  "root": "root",
  "version": 7
}

Any attempt to build gdal results in several failed pytests in the test phase: The nix log can be found here

System Metadata from nix-info -m:

 - system: `"x86_64-linux"`
 - host os: `Linux 6.8.7, NixOS, 24.05 (Uakari), 24.05.20240423.572af61`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.18.2`
 - nixpkgs: `/nix/store/m5i890m2g4pnyflpn48d1dpzzmwp5q4p-source`
Chwiggy commented 2 months ago

switching to the latest revision fixed from today fixed the issue again


"nixpkgs": {
      "locked": {
        "lastModified": 1714635257,
        "narHash": "sha256-4cPymbty65RvF1DWQfc+Bc8B233A1BWxJnNULJKQ1EY=",
        "owner": "nixos",
        "repo": "nixpkgs",
        "rev": "63c3a29ca82437c87573e4c6919b09a24ea61b0f",
        "type": "github"
      },
      "original": {
        "owner": "nixos",
        "ref": "nixos-unstable",
        "repo": "nixpkgs",
        "type": "github"
      }
    },```
Chwiggy commented 2 months ago

But somehow i can't reproduce that issue with the old flake.lock now either

imincik commented 2 months ago

I just successfully ran nix build --rebuild github:nixos/nixpkgs/nixos-unstable#gdal

Chwiggy commented 2 months ago

This issue really seems to be about timing. The gdal build seems to break for me intermittently on nixpkgs unstable channel updates, with that same failure pattern, but that fixes itself or gets fixed within a day (or so) or with the next update to the unstable nixpkgs channnel.

Currently after a flake update

...
• Updated input 'nixpkgs':
    'github:nixos/nixpkgs/63c3a29ca82437c87573e4c6919b09a24ea61b0f' (2024-05-02)
  → 'github:nixos/nixpkgs/25865a40d14b3f9cf19f19b924e2ab4069b09588' (2024-05-05)
...

my configuration fails to build gdal again and so does nix build github:nixos/nixpkgs/nixos-unstable#gdal.

$ nix build github:nixos/nixpkgs/nixos-unstable#gdal
error: builder for '/nix/store/dfbv5x2jfk06pgxh8n4v5cyyzx85j34i-gdal-3.8.5.drv' failed with exit code 1;
       last 10 log lines:
       > FAILED utilities/test_gdal_translate_lib.py::test_gdal_translate_lib_1 - AssertionError: Bad checksum
       > FAILED utilities/test_gdal_translate_lib.py::test_gdal_translate_lib_2 - AssertionError: Bad checksum
       > FAILED utilities/test_gdal_translate_lib.py::test_gdal_translate_lib_11 - AssertionError: Bad checksum
       > FAILED utilities/test_gdallocationinfo.py::test_gdallocationinfo_6 - AssertionError: assert 'Value: 130' in 'Report:\n  Location: (10P,10L)\n  B...
       > FAILED utilities/test_gdalmdimtranslate_lib.py::test_gdalmdimtranslate_classic_to_classic - AssertionError: assert 0 == 4672
       > FAILED utilities/test_gdalwarp.py::test_gdalwarp_40 - AssertionError: assert 0 == 237
       > FAILED pyscripts/test_gdallocationinfo_py.py::test_gdallocationinfo_py_6 - AssertionError: assert 'Value: 130' in 'Report:\n  Location: (10P,10L)\n  B...
       > = 37 failed, 11733 passed, 1352 skipped, 6 deselected, 6 xfailed, 43 warnings in 466.05s (0:07:46) =
       > 100 - done.
       > /nix/store/558iw5j1bk7z6wrg8cp96q2rx03jqj1v-stdenv-linux/setup: line 1579: pop_var_context: head of shell_variables not a function context
       For full logs, run 'nix log /nix/store/dfbv5x2jfk06pgxh8n4v5cyyzx85j34i-gdal-3.8.5.drv'.

But i wouldn't bet on that being reproducible tomorrow or even in a few hours

Lillian-Violet commented 2 months ago

Can reproduce the build failing with the same message on the latest unstable as of writing.

markuskowa commented 2 months ago

That seems to be flaky test. I can build it locally (nix-build -A gdal on x86_64-linux) as of ff66b79bd78a180e36381c83e2b9c63f54f45a19.

Chwiggy commented 2 months ago

Well what would you suggest to try and narrow this build error down or to avoid it, @markuskowa?

imincik commented 2 months ago

@Chwiggy , do you use some non-default max cores or jobs configuration ?

Chwiggy commented 2 months ago

@imincik no, that should be the default configuration

Chwiggy commented 2 months ago

trying again after an update of the nixpkgs input

• Updated input 'nixpkgs':
    'github:nixos/nixpkgs/25865a40d14b3f9cf19f19b924e2ab4069b09588' (2024-05-05)
  → 'github:nixos/nixpkgs/b211b392b8486ee79df6cdfb1157ad2133427a29' (2024-05-07)

still yields the same pattern of failed tests. unfortunately, i'm not sure how to try and make this issue any more reproducible. I'm tempted to just skip the pytestCheckPhase, tho i'm not quite sure how to best do that, or if that's wise

imincik commented 2 months ago

By my experience, GDAL tests are more likely to fail under heavy load - build machine is low on resources or too many packages built in parallel (like on Hydra).

What kind of machine do you have @Chwiggy ?

Chwiggy commented 2 months ago

It's a fairly well specked laptop, from like 2 or 3 years ago (11th gen i7, 16 GB of RAM). And at this point just building and testing gdal on it's own when i try. During the buildPhase it's definitely using all of the processing power it got, but during the tests, it doesn't seem to be straining for more resources

imincik commented 2 months ago

Your machine looks OK.

Can you try to build gdal from geospatial-nix to see if it fails as well ?

nix build github:imincik/geospatial-nix#gdal
Chwiggy commented 2 months ago

That fails as well, with a similar pattern of failed tests

error: builder for '/nix/store/lny9h62428a3dks9m2k3hvmj9mk1zs5x-gdal-3.8.5.drv' failed with exit code 1;
       last 10 log lines:
       > FAILED utilities/test_gdal_translate_lib.py::test_gdal_translate_lib_1 - AssertionError: Bad checksum
       > FAILED utilities/test_gdal_translate_lib.py::test_gdal_translate_lib_2 - AssertionError: Bad checksum
       > FAILED utilities/test_gdal_translate_lib.py::test_gdal_translate_lib_11 - AssertionError: Bad checksum
       > FAILED utilities/test_gdallocationinfo.py::test_gdallocationinfo_6 - AssertionError: assert 'Value: 130' in 'Report:\n  Location: (10P,10L)\n  B...
       > FAILED utilities/test_gdalmdimtranslate_lib.py::test_gdalmdimtranslate_classic_to_classic - AssertionError: assert 0 == 4672
       > FAILED utilities/test_gdalwarp.py::test_gdalwarp_40 - AssertionError: assert 0 == 237
       > FAILED pyscripts/test_gdallocationinfo_py.py::test_gdallocationinfo_py_6 - AssertionError: assert 'Value: 130' in 'Report:\n  Location: (10P,10L)\n  B...
       > = 32 failed, 11708 passed, 1382 skipped, 6 deselected, 6 xfailed, 40 warnings in 325.91s (0:05:25) =
       > 100 - done.
       > /nix/store/nkskw6n4swnlxjrwv47mpb4a4h8r85wg-stdenv-linux/setup: line 1561: pop_var_context: head of shell_variables not a function context
       For full logs, run 'nix log /nix/store/lny9h62428a3dks9m2k3hvmj9mk1zs5x-gdal-3.8.5.drv'.

most of the failed tests seem to concern file i/o operations and checksums of the different files involved in i/o operations :eyes:. This might actually be a filesystem specific bug, considering this machine is set up with bcachefs. That still doesn't quite explain why the test phase only failed intermittently, but it's another lead but not really an issue with nixpkgs anymore. I'll need to try that later on a machine with a different filesystem.

imincik commented 2 months ago

considering this machine is set up with bcachefs.

Yes, I was just about to ask about the filesystem. This might be the issue. I've seen multiple gdal tests failing from time to time, but never seen the problems as you have.

imincik commented 2 months ago

That still doesn't quite explain why the test phase only failed intermittently, but it's another lead but not really an issue with nixpkgs anymore. I'll need to try that later on a machine with a different filesystem.

Good idea.

imincik commented 1 month ago

I am closing this issue since it looks like that GDAL is not the root cause of the problem.