bioimage-io / collection

Maintains the resources displayed on bioimage.io (Successor to collection-bioimage-io)
https://bioimage-io.github.io/collection/
0 stars 2 forks source link

Test passes locally but fails on CI with identical environments #84

Open qin-yu opened 2 months ago

qin-yu commented 2 months ago

TorchScript is deterministic for the same input, given the same model state and environment. This means that, in theory, for the same input, it should always produce the same output if no external factors change. Both bioimageio test rdf.yml pytorch_state_dict and bioimageio test rdf.yml torchscript passes in my env created from

Anyways, CI fails, such as this: https://github.com/bioimage-io/collection/actions/runs/9892771006/job/27326341115

### `conda list` output ```bash $ mamba list # Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_kmp_llvm conda-forge annotated-types 0.7.0 pyhd8ed1ab_0 conda-forge aom 3.9.1 hac33072_0 conda-forge bioimageio.core 0.6.7 pyhd8ed1ab_0 conda-forge bioimageio.spec 0.5.3.post4 pyhd8ed1ab_0 conda-forge blas 1.0 mkl conda-forge brotli-python 1.1.0 py312h30efb56_1 conda-forge bzip2 1.0.8 hd590300_5 conda-forge ca-certificates 2024.7.4 hbcca054_0 conda-forge cairo 1.18.0 hbb29018_2 conda-forge certifi 2024.7.4 pyhd8ed1ab_0 conda-forge cffi 1.16.0 py312hf06ca03_0 conda-forge charset-normalizer 3.3.2 pyhd8ed1ab_0 conda-forge colorama 0.4.6 pyhd8ed1ab_0 conda-forge cpuonly 2.0 0 pytorch dav1d 1.2.1 hd590300_0 conda-forge distro 1.9.0 pyhd8ed1ab_0 conda-forge dnspython 2.6.1 pyhd8ed1ab_1 conda-forge email-validator 2.2.0 pyhd8ed1ab_0 conda-forge email_validator 2.2.0 hd8ed1ab_0 conda-forge expat 2.6.2 h59595ed_0 conda-forge ffmpeg 7.0.1 gpl_h9be9148_104 conda-forge filelock 3.15.4 pyhd8ed1ab_0 conda-forge fire 0.6.0 pyhd8ed1ab_0 conda-forge font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge font-ttf-inconsolata 3.000 h77eed37_0 conda-forge font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge font-ttf-ubuntu 0.83 h77eed37_2 conda-forge fontconfig 2.14.2 h14ed4e7_0 conda-forge fonts-conda-ecosystem 1 0 conda-forge fonts-conda-forge 1 0 conda-forge freetype 2.12.1 h267a509_2 conda-forge fribidi 1.0.10 h36c2ea0_0 conda-forge gettext 0.22.5 h59595ed_2 conda-forge gettext-tools 0.22.5 h59595ed_2 conda-forge gmp 6.3.0 hac33072_2 conda-forge gmpy2 2.1.5 py312h1d5cde6_1 conda-forge gnutls 3.7.9 hb077bed_0 conda-forge graphite2 1.3.13 h59595ed_1003 conda-forge h2 4.1.0 pyhd8ed1ab_0 conda-forge harfbuzz 9.0.0 hfac3d4d_0 conda-forge hpack 4.0.0 pyh9f0ad1d_0 conda-forge hyperframe 6.0.1 pyhd8ed1ab_0 conda-forge icu 73.2 h59595ed_0 conda-forge idna 3.7 pyhd8ed1ab_0 conda-forge imageio 2.34.2 pyh12aca89_0 conda-forge jinja2 3.1.4 pyhd8ed1ab_0 conda-forge lame 3.100 h166bdaf_1003 conda-forge lcms2 2.16 hb7c19ff_0 conda-forge ld_impl_linux-64 2.40 hf3520f5_7 conda-forge lerc 4.0.0 h27087fc_0 conda-forge libabseil 20240116.2 cxx17_h59595ed_0 conda-forge libasprintf 0.22.5 h661eb56_2 conda-forge libasprintf-devel 0.22.5 h661eb56_2 conda-forge libass 0.17.1 h39113c1_2 conda-forge libblas 3.9.0 16_linux64_mkl conda-forge libcblas 3.9.0 16_linux64_mkl conda-forge libdeflate 1.20 hd590300_0 conda-forge libdrm 2.4.122 h4ab18f5_0 conda-forge libexpat 2.6.2 h59595ed_0 conda-forge libffi 3.4.2 h7f98852_5 conda-forge libgcc-ng 14.1.0 h77fa898_0 conda-forge libgettextpo 0.22.5 h59595ed_2 conda-forge libgettextpo-devel 0.22.5 h59595ed_2 conda-forge libglib 2.80.3 h8a4344b_1 conda-forge libhwloc 2.11.0 default_h5622ce7_1000 conda-forge libiconv 1.17 hd590300_2 conda-forge libidn2 2.3.7 hd590300_0 conda-forge libjpeg-turbo 3.0.0 hd590300_1 conda-forge liblapack 3.9.0 16_linux64_mkl conda-forge libnsl 2.0.1 hd590300_0 conda-forge libopenvino 2024.2.0 h2da1b83_1 conda-forge libopenvino-auto-batch-plugin 2024.2.0 hb045406_1 conda-forge libopenvino-auto-plugin 2024.2.0 hb045406_1 conda-forge libopenvino-hetero-plugin 2024.2.0 h5c03a75_1 conda-forge libopenvino-intel-cpu-plugin 2024.2.0 h2da1b83_1 conda-forge libopenvino-intel-gpu-plugin 2024.2.0 h2da1b83_1 conda-forge libopenvino-intel-npu-plugin 2024.2.0 he02047a_1 conda-forge libopenvino-ir-frontend 2024.2.0 h5c03a75_1 conda-forge libopenvino-onnx-frontend 2024.2.0 h07e8aee_1 conda-forge libopenvino-paddle-frontend 2024.2.0 h07e8aee_1 conda-forge libopenvino-pytorch-frontend 2024.2.0 he02047a_1 conda-forge libopenvino-tensorflow-frontend 2024.2.0 h39126c6_1 conda-forge libopenvino-tensorflow-lite-frontend 2024.2.0 he02047a_1 conda-forge libopus 1.3.1 h7f98852_1 conda-forge libpciaccess 0.18 hd590300_0 conda-forge libpng 1.6.43 h2797004_0 conda-forge libprotobuf 4.25.3 h08a7969_0 conda-forge libsqlite 3.46.0 hde9e2c9_0 conda-forge libstdcxx-ng 14.1.0 hc0a3c3a_0 conda-forge libtasn1 4.19.0 h166bdaf_0 conda-forge libtiff 4.6.0 h1dd3fc0_3 conda-forge libunistring 0.9.10 h7f98852_0 conda-forge libuuid 2.38.1 h0b41bf4_0 conda-forge libva 2.22.0 hb711507_0 conda-forge libvpx 1.14.1 hac33072_0 conda-forge libwebp-base 1.4.0 hd590300_0 conda-forge libxcb 1.16 hd590300_0 conda-forge libxcrypt 4.4.36 hd590300_1 conda-forge libxml2 2.12.7 h4c95cb1_2 conda-forge libzlib 1.3.1 h4ab18f5_1 conda-forge llvm-openmp 15.0.7 h0cdce71_0 conda-forge loguru 0.7.2 py312h7900ff3_1 conda-forge markdown-it-py 3.0.0 pyhd8ed1ab_0 conda-forge markupsafe 2.1.5 py312h98912ed_0 conda-forge mdurl 0.1.2 pyhd8ed1ab_0 conda-forge mkl 2022.2.1 h84fe81f_16997 conda-forge mpc 1.3.1 hfe3b2da_0 conda-forge mpfr 4.2.1 h9458935_1 conda-forge mpmath 1.3.0 pyhd8ed1ab_0 conda-forge ncurses 6.5 h59595ed_0 conda-forge nettle 3.9.1 h7ab15ed_0 conda-forge networkx 3.3 pyhd8ed1ab_1 conda-forge numpy 1.26.4 py312heda63a1_0 conda-forge ocl-icd 2.3.2 hd590300_1 conda-forge openh264 2.4.1 h59595ed_0 conda-forge openjpeg 2.5.2 h488ebb8_0 conda-forge openssl 3.3.1 h4ab18f5_1 conda-forge p11-kit 0.24.1 hc5aa10d_0 conda-forge packaging 24.1 pyhd8ed1ab_0 conda-forge pandas 2.2.2 py312h1d6d2e6_1 conda-forge pcre2 10.44 h0f59acf_0 conda-forge pillow 10.4.0 py312h287a98d_0 conda-forge pip 24.0 pyhd8ed1ab_0 conda-forge pixman 0.43.2 h59595ed_0 conda-forge platformdirs 4.2.2 pyhd8ed1ab_0 conda-forge pooch 1.8.2 pyhd8ed1ab_0 conda-forge pthread-stubs 0.4 h36c2ea0_1001 conda-forge pugixml 1.14 h59595ed_0 conda-forge pycparser 2.22 pyhd8ed1ab_0 conda-forge pydantic 2.8.2 pyhd8ed1ab_0 conda-forge pydantic-core 2.20.1 py312hf008fa9_0 conda-forge pydantic-settings 2.3.4 pyhd8ed1ab_0 conda-forge pygments 2.18.0 pyhd8ed1ab_0 conda-forge pysocks 1.7.1 pyha2e5f31_6 conda-forge python 3.12.4 h194c7f8_0_cpython conda-forge python-dateutil 2.9.0 pyhd8ed1ab_0 conda-forge python-dotenv 1.0.1 pyhd8ed1ab_0 conda-forge python-tzdata 2024.1 pyhd8ed1ab_0 conda-forge python_abi 3.12 4_cp312 conda-forge pytorch 2.3.1 py3.12_cpu_0 pytorch pytorch-mutex 1.0 cpu pytorch pytz 2024.1 pyhd8ed1ab_0 conda-forge pyyaml 6.0.1 py312h98912ed_1 conda-forge readline 8.2 h8228510_1 conda-forge requests 2.32.3 pyhd8ed1ab_0 conda-forge rich 13.7.1 pyhd8ed1ab_0 conda-forge ruyaml 0.91.0 pyhd8ed1ab_0 conda-forge setuptools 70.2.0 pyhd8ed1ab_0 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge snappy 1.2.1 ha2e4443_0 conda-forge sniffio 1.3.1 pyhd8ed1ab_0 conda-forge svt-av1 2.1.2 hac33072_0 conda-forge sympy 1.12.1 pypyh2585a3b_103 conda-forge tbb 2021.12.0 h434a139_2 conda-forge termcolor 2.4.0 pyhd8ed1ab_0 conda-forge tk 8.6.13 noxft_h4845f30_101 conda-forge torchaudio 2.3.1 py312_cpu pytorch torchvision 0.18.1 py312_cpu pytorch tqdm 4.66.4 pyhd8ed1ab_0 conda-forge typing-extensions 4.12.2 hd8ed1ab_0 conda-forge typing_extensions 4.12.2 pyha770c72_0 conda-forge tzdata 2024a h0c530f3_0 conda-forge urllib3 2.2.2 pyhd8ed1ab_1 conda-forge wayland 1.23.0 h5291e77_0 conda-forge wayland-protocols 1.36 hd8ed1ab_0 conda-forge wheel 0.43.0 pyhd8ed1ab_1 conda-forge x264 1!164.3095 h166bdaf_2 conda-forge x265 3.5 h924138e_3 conda-forge xarray 2024.6.0 pyhd8ed1ab_1 conda-forge xorg-fixesproto 5.0 h7f98852_1002 conda-forge xorg-kbproto 1.0.7 h7f98852_1002 conda-forge xorg-libice 1.1.1 hd590300_0 conda-forge xorg-libsm 1.2.4 h7391055_0 conda-forge xorg-libx11 1.8.9 hb711507_1 conda-forge xorg-libxau 1.0.11 hd590300_0 conda-forge xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge xorg-libxext 1.3.4 h0b41bf4_2 conda-forge xorg-libxfixes 5.0.3 h7f98852_1004 conda-forge xorg-libxrender 0.9.11 hd590300_0 conda-forge xorg-renderproto 0.11.1 h7f98852_1002 conda-forge xorg-xextproto 7.3.0 h0b41bf4_1003 conda-forge xorg-xproto 7.0.31 h7f98852_1007 conda-forge xz 5.2.6 h166bdaf_0 conda-forge yaml 0.2.5 h7f98852_2 conda-forge zlib 1.3.1 h4ab18f5_1 conda-forge zstandard 0.22.0 py312h5b18bf6_1 conda-forge zstd 1.5.6 ha6fb4c9_0 conda-forge ```
qin-yu commented 2 months ago

Hey @oeway this job https://github.com/bioimage-io/collection/actions/runs/9942967885 has been queuing for a while but there are no other jobs running. Could you have a look?

qin-yu commented 2 months ago

With the same conda environment on my EMBL disk, running bioimage.io test rfd.yaml on EMBL Kreshuk node will have a mismatch of 15.1% at 4-decimal precision for a package exported on EMBL Jupyter Hub VM; and vise versa.

I believe it's a problem cause by environment variable and/or CPU architecture.

qin-yu commented 2 months ago

Alright, now we know that AVX2 and AVX512 on Xeon give similar but slightly different results (Mismatched elements: 4073 / 2073600 (0.196%)), while these results are very different compared to non-Xeon machines (Mismatched elements: 313129 / 2073600 (15.1%) ). So the problem is not from the use of AVX512 in Xeon, but something else.

qin-yu commented 2 months ago

Another test I made was: I opened a Xeon Jupyter Hub instance and the output matches my Xeon kreshuk-gpu1. This rules out the possibility of user-set environment variables being the cause.