onnxruntime v1.19.2 - Githubissues

regro-cf-autotick-bot commented 3 months ago

Closes #131 Closes #133

It is very likely that the current package version for this feedstock is out of date.

Checklist before merging this PR:

[ ] Dependencies have been updated if changed: see upstream
[ ] Tests have passed
[ ] Updated license if changed and license_file is packaged

Information about this PR:

Feel free to push to the bot's branch to update this PR if needed.
The bot will almost always only open one PR per version.
The bot will stop issuing PRs if more than 3 version bump PRs generated by the bot are open. If you don't want to package a particular version please close the PR.
If you want these PRs to be merged automatically, make an issue with code>@conda-forge-admin,</codeplease add bot automerge in the title and merge the resulting PR. This command will add our bot automerge feature to your feedstock.
If this PR was opened in error or needs to be updated please add the bot-rerun label to this PR. The bot will close this PR and schedule another one. If you do not have permissions to add this label, you can use the phrase code>@<space/conda-forge-admin, please rerun bot in a PR comment to have the conda-forge-admin add it for you.

Pending Dependency Version Updates

Here is a list of all the pending dependency version updates for this repo. Please double check all dependencies before merging.

Name	Upstream Version	Current Version
cudnn	9.3.0.75
protobuf	28.0
setuptools	74.1.1

_{This PR was created by the regro-cf-autotick-bot. The regro-cf-autotick-bot is a service to automatically track the dependency graph, migrate packages, and propose package version updates for conda-forge. Feel free to drop us a line if there are any issues! This PR was generated by - please use this URL for debugging.}

conda-forge-webservices[bot] commented 3 months ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

hmaarrfk commented 2 months ago

PS release notes indicate that numpy2.0 is supported do you want to pull in the migration?

hmaarrfk commented 2 months ago

The file seems there, but maybe just not found:

$ find -name host_config.h
./_h_env/targets/x86_64-linux/include/host_config.h
./_build_env/targets/x86_64-linux/include/host_config.h
./_build_env/targets/x86_64-linux/include/crt/host_config.h

cbourjau commented 2 months ago

Only the Cuda builds are failing now, and I'm a little puzzled by this. The Cuda 11 builds fail with a very similar but not identical error.

hmaarrfk commented 2 months ago

cc: @jakirkham any ideas?

cbourjau commented 2 months ago

Just bumping this for visibility. I could imagine that it is a pretty straightforward fix, but I don't have the hardware to debug this easily. Any help is much appreciated!

hmaarrfk commented 2 months ago

unfortunately, even looking at the cuda 12 i can't figure it out on linux.

hmaarrfk commented 2 months ago

I think jakirkham was AFK last week, so maybe we can ping him again next?

hmaarrfk commented 2 months ago

Just going through some findings;

Th ediff between v1.19.2 shows that we still might need to disable installing requirements on windows.

         if args.enable_pybind and is_windows():
-            install_python_deps(args.numpy_version)
+            run_subprocess(
+                [sys.executable, "-m", "pip", "install", "-r", "requirements/pybind/requirements.txt"],
+                cwd=SCRIPT_DIR,
+            )

hmaarrfk commented 2 months ago

oh seems like you fixed pip for windows great!

hmaarrfk commented 2 months ago

@cbourjau just to confirm, you don't need a CUDA enabled machine on linux, just linux+docker.

I give up for today, it just doesn't seem like their build scripts changed all that much, and the previous build is found fine....

cbourjau commented 2 months ago

I'm afraid I won't have much time to spend on this in the next couple of weeks. Do you think it may be reasonable to temporarily disable the Cuda builds and to release the CPU-only 1.19.2 packages, @hmaarrfk ?

hmaarrfk commented 2 months ago

Do you think it may be reasonable to temporarily disable the Cuda builds

Is this really what you want? Generally speaking CUDA is a great enabling technology for ML.

Do you want to field the slew of questions that will come from CUDA users updating to 19.2 with CPU only support?

I would love it if instead you:

Enabled upload of artifacts store_build_artifacts: true https://conda-forge.org/docs/maintainer/conda_forge_yml/#azure
rerendered
Downloaded them
Uploaded them to your own channel

Generally this is what I would suggest for others that have "incomplete" packages.

However, I do understand that CUDA is alot of work, but the performance loss so great that it is almost worst to have a onnx package without cuda support......

I am however unable to help for the last few weeks so I can just as easily limit our version of onnx to 18..... (on my own private channel)

jakirkham commented 2 months ago

I think jakirkham was AFK last week, so maybe we can ping him again next?

Yeah was on PTO for a bit

Noticed that upstream made both a 1.18.2 and a 1.19.2 release around the same time

Given this is on 1.18.1, would it be worth trying 1.18.2? This might be a smaller step with fewer changes. Also it might give us the opportunity to fix a few issues before jumping to the 1.19.x series

hmaarrfk commented 2 months ago

Part of the sad sad thing that made me sad is that even rerendering failed on windows: https://github.com/conda-forge/onnxruntime-feedstock/pull/131

jakirkham commented 2 months ago

It's looking for nvcc in the host environment instead of the build environment

Please see this CI job with snippet below:

CMake Error at CMakeLists.txt:735 (enable_language):
  The CMAKE_CUDA_COMPILER:

    C:/bld/onnxruntime_1725973208545/_h_env/Library/bin/nvcc

  is not a full path to an existing compiler tool.

Compare this to where it finds cmake

--   CMake command                     : C:/bld/onnxruntime_1725973208545/_build_env/Library/bin/cmake.exe

jakirkham commented 2 months ago

Also commented in the other PR: https://github.com/conda-forge/onnxruntime-feedstock/pull/131#issuecomment-2390294030

hmaarrfk commented 2 months ago

Thanks, i'm trying to use git bisect to find the problematic commit here too...

hmaarrfk commented 2 months ago

Ok this seems like the offending commit: https://github.com/microsoft/onnxruntime/commit/7b3fff650a6ae2c8a3a6a4487cb76672fd21dde4

indeed reverting it gets the build scripts to pass and the compilation to start (on linux)

though maybe we should be specifying CUDA_HOME?

hmaarrfk commented 2 months ago

@jakirkham i'm not sure i know how to make the argument to upstream that they should provide this as an option, and not as a default.

jakirkham commented 2 months ago

This CI error looks kind of odd

Ignoring COMPILE_WARNING_AS_ERROR target property and CMAKE_COMPILE_WARNING_AS_ERROR variable.
CMake Deprecation Warning at CMakeLists.txt:14 (cmake_policy):
  The OLD behavior for policy CMP0104 will be removed from a future version
  of CMake.

  The cmake-policies(7) manual explains that the OLD behaviors of all
  policies are deprecated and that a policy should be set to OLD only under
  specific short-term circumstances.  Projects should be ported to the NEW
  behavior and not rely on setting a policy to OLD.

CMake Error at C:/bld/onnxruntime_1727920444921/_build_env/Library/share/cmake-3.30/Modules/CMakeDetermineCCompiler.cmake:49 (message):
  Could not find compiler set in environment variable CC:

  cl.exe.
Call Stack (most recent call first):
  CMakeLists.txt:20 (project)

CMake Error: CMAKE_C_COMPILER not set, after EnableLanguage
CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage
CMake Error: CMAKE_ASM_COMPILER not set, after EnableLanguage

Does it need the full path to cl?

Maybe we could use where cl to fill out CC and CXX

hmaarrfk commented 2 months ago

@conda-forge-admin please rerendre

hmaarrfk commented 2 months ago

@conda-forge-admin please rerender

hmaarrfk commented 2 months ago

Sorry to ask @xhochy but I don't have the machine to debug this:

Asking you since you seem interested in windows support too. https://github.com/conda-forge/onnxruntime-feedstock/pull/97

hmaarrfk commented 1 month ago

I am VERY rusty on windows skills but:

I downloaded VisualStudio Community 2022. Installed almost every C++ package.
When i activated my environment, I got this error:

C:\Program Files\Microsoft Visual Studio\2022\Community>if "14.41-14.40" == "14.40-14.41" (CALL "VC\Auxiliary\Build\vcvars64.bat" -vcvars_ver=14.41 10.0.26100.0 )  else (CALL "VC\Auxiliary\Build\vcvars64.bat" -vcvars_ver=14.40 10.0.26100.0 )
**********************************************************************
** Visual Studio 2022 Developer Command Prompt v17.11.4
** Copyright (c) 2022 Microsoft Corporation
**********************************************************************
[ERROR:vcvars.bat] Toolset directory for version '14.40' was not found.
[ERROR:VsDevCmd.bat] *** VsDevCmd.bat encountered errors. Environment may be incomplete and/or incorrect. ***
[ERROR:VsDevCmd.bat] In an uninitialized command prompt, please 'set VSCMD_DEBUG=[value]' and then re-run
[ERROR:VsDevCmd.bat] vsdevcmd.bat [args] for additional details.
[ERROR:VsDevCmd.bat] Where [value] is:
[ERROR:VsDevCmd.bat]    1 : basic debug logging
[ERROR:VsDevCmd.bat]    2 : detailed debug logging
[ERROR:VsDevCmd.bat]    3 : trace level logging. Redirection of output to a file when using this level is recommended.
[ERROR:VsDevCmd.bat] Example: set VSCMD_DEBUG=3
[ERROR:VsDevCmd.bat]          vsdevcmd.bat > vsdevcmd.trace.txt 2>&1

I commented out the "exit 1" in the conda_build.bat
called it with call conda_build.bat
Got the same error as the CIs.
Eventually for it to find cl.exe i had to add a path like:

C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.41.34120\bin\Hostx64\x64

I installed more things from window's mega downloader and will restart my computer and try again.

hmaarrfk commented 1 month ago

@conda-forge-admin please rerender

hmaarrfk commented 1 month ago

well i have hope!

hmaarrfk commented 1 month ago

ok clearly my hope was misguided. @jakirkham any clue? even if just for CUDA 12.

hmaarrfk commented 1 month ago

https://forums.developer.nvidia.com/t/problems-with-latest-vs2022-update/294150

hmaarrfk commented 1 month ago

So on the newer 19.41 compiler version, there exists a if statement:

#ifndef _ALLOW_COMPILER_AND_STL_VERSION_MISMATCH 
#if defined(__CUDACC__) && defined(__CUDACC_VER_MAJOR__)
#if __CUDACC_VER_MAJOR__ < 12 || (__CUDACC_VER_MAJOR__ == 12 && __CUDACC_VER_MINOR__ < 4)
_EMIT_STL_ERROR(STL1002, "Unexpected compiler version, expected CUDA 12.4 or newer.");
#endif // ^^^ old CUDA ^^^
#elif defined(__EDG__)
// not attempting to detect __EDG_VERSION__ being less than expected
#elif defined(__clang__)
#if __clang_major__ < 17
_EMIT_STL_ERROR(STL1000, "Unexpected compiler version, expected Clang 17.0.0 or newer.");
#endif // ^^^ old Clang ^^^
#elif defined(_MSC_VER)
#if _MSC_VER < 1940 // Coarse-grained, not inspecting _MSC_FULL_VER
_EMIT_STL_ERROR(STL1001, "Unexpected compiler version, expected MSVC 19.40 or newer.");
#endif // ^^^ old MSVC ^^^
#else // vvv other compilers vvv
// not attempting to detect other compilers
#endif // ^^^ other compilers ^^^
#endif // !defined(_ALLOW_COMPILER_AND_STL_VERSION_MISMATCH)

I should be able to define _ALLOW_COMPILER_AND_STL_VERSION_MISMATCH but I am unable to.

Instead if I manually change:

__CUDACC_VER_MINOR__ < 4

to

__CUDACC_VER_MINOR__ < 0

it effectively disables the check and allows one to move forward.

hmaarrfk commented 1 month ago

ok now fingers crossed.

hmaarrfk commented 1 month ago

is there any way to squeeze a bit more RAM out on windows?

edit: added 'on windows'

xhochy commented 1 month ago

Sorry to ask @xhochy but I don't have the machine to debug this:

Asking you since you seem interested in windows support too. #97

I work together with @cbourjau and we are only interested in CPU support. GPUs are not bringing us noticeable performance improvement for our workloads. Also Windows is just a nice-to-have 🤷 Thus, I'm not so motivated to spend time on fixing this.

jakirkham commented 1 month ago

is there any way to squeeze a bit more RAM out on windows?

There are some things that could be cleaned from the images: https://github.com/conda-forge/conda-smithy/pull/1966

That removal process is a bit slow though. If you know ways to make that faster, please let me know

Otherwise we could just borrow from those and include them in the build here. For example this is what we did in matplotlib to address a similar issue

hmaarrfk commented 1 month ago

There are some things that could be cleaned from the images: https://github.com/conda-forge/conda-smithy/pull/1966

i think i got an out of memory error, so not disk related. would those cleanups help freeup some RAM space?

hmaarrfk commented 1 month ago

I work together with @cbourjau

hehe, good to know!

and we are only interested in CPU support.

Interesting...

Also Windows is just a nice-to-have 🤷 Thus, I'm not so motivated to spend time on fixing this.

haha, your introduction of ONNX+windows was a motivator for me to start to bring my application to windows :)

Lets see if we can squeeze a bit more performance before reducing the number architectures.

jakirkham commented 1 month ago

i think i got an out of memory error, so not disk related. would those cleanups help freeup some RAM space?

My understanding of how CI is setup they all count against the same limit

hmaarrfk commented 1 month ago

My understanding of how CI is setup they all count against the same limit

ok lets try.

jakirkham commented 1 month ago

Sound good. Would grab a coffee :)

hmaarrfk commented 1 month ago

https://xkcd.com/303/

hmaarrfk commented 1 month ago

It took about 1 hour for the cuda on an 11th gen 9 intel processor.

Builds and about 16GB of RAM of extra RAM. My Windows RAM usage went up to 24GB + "16cache".

The largerst an NVCC/ICCI took was visually 2-3 GB of RAM. so I feel like it should be able to handle it on a CI, but maybe they are wimpier than I would think.

hmaarrfk commented 1 month ago

may i ask what the "novec" and "vec" options mean? perhaps since there is little interest in the windows builds we can avoid 1/2 of them?

I would like to try to maintain windows build at least for "modern" hardware

jtilly commented 1 month ago

may i ask what the "novec" and "vec" options mean?

I added the novec variant in https://github.com/conda-forge/onnxruntime-feedstock/pull/36. eigen has its own way to vectorize floating point operations which results in (slightly) inconsistent predictions depending on whether we pass 1 row or multiple rows into a model. The novec build compiles eigen with EIGEN_DONT_VECTORIZE. I think we don't need CUDA builds for the novec variant (I'm assuming the flag doesn't do anything for CUDA kernels anyways). Also, feel free to drop the novec Windows builds (until someone asks for them).

xhochy commented 1 month ago

I work together with @cbourjau

hehe, good to know!

As we're already at it, full disclosure: All maintainers except you on this feedstock are employed by QuantCo and use onnxruntime for the same use case. My windows support was mainly to enable other users besides QuantCo and I'm also happy to help a bit more than needed, but Windows+CUDA is a combination where I lack any expertise.

traversaro commented 1 month ago

Probably I the one that added CUDA support on Windows and is interested in it. I sometimes miss PRs related to Windows+CUDA in this repo, my bad, but I am interesting in making sure that they work, and happy to help (happy to also be listed as a maintainer so in that case I would be added as a reviewer to PRs, that increase the chance that I lose some PRs).

I can confirm that I am not interested in novec builds on Windows.

conda-forge-admin commented 1 month ago

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found some lint.

Here's what I've got...

For recipe/meta.yaml:

Recipe maintainer "jtilly" does not exist

hmaarrfk commented 1 month ago

@traversaro you'll be added on the next rerender. I'm trying to cut the cuda architectures, to get it to work on the CIs..... but..... we are now at 80,86,90 which is likely too few even for my taste.

Are you able to build out the windows matrix?

conda-forge-admin commented 1 month ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

hmaarrfk commented 1 month ago

@conda-forge-admin please rerender

traversaro commented 1 month ago

Are you able to build out the windows matrix?

I can look into that, but it will probably take me a few days. If this is blocking, probably we can skip Windows for the time being? By skipping the whole Windows build (instead of just CUDA) we avoid the problem of people updating and ending up with CPU-only onnxruntime.

conda-forge / onnxruntime-feedstock

onnxruntime v1.19.2 #128

Pending Dependency Version Updates