dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
8.91k stars 1.86k forks source link

Helix test fail on latest torchsharp (0.102.5) and its runtime #7182

Open LittleLittleCloud opened 4 days ago

LittleLittleCloud commented 4 days ago

System Information (please complete the following information):

Describe the bug

The Microsoft.ML.Torchsharp.Tests fails in the following helix tests if I update torchsharp and its runtime to 0.102.5 and 2.2.1.1.

The error message from helix tests indicates some dependencies of liblibtorchsharp is missing. After turning on LD_Debug, it seems that one of the missing dependencies is GLIBC_2.34. Note that the image for helix test is still centos 8 streaming which glib version is 2.28. This could be the why the torchsharp test failures.

file=/temp/workitems/Microsoft.ML.TorchSharp.Tests/runtimes/linux-x64/native/libtorch_cpu.so
       260:
       260:     find library=libgomp-98b21ff3.so.1 [0]; searching
       260:      search path=/usr/local/lib:/usr/local/lib64            (LD_LIBRARY_PATH)
       260:       trying file=/usr/local/lib/libgomp-98b21ff3.so.1
       260:       trying file=/usr/local/lib64/libgomp-98b21ff3.so.1
       260:      search path=/temp/workitems/Microsoft.ML.TorchSharp.Tests/runtimes/linux-x64/native            (RUNPATH from file /temp/workitems/Microsoft.ML.TorchSharp.Tests/runtimes/linux-x64/native/libtorch_cpu.so)
       260:       trying file=/temp/workitems/Microsoft.ML.TorchSharp.Tests/runtimes/linux-x64/native/libgomp-98b21ff3.so.1
       260:
       260:     /lib64/libc.so.6: error: version lookup error: version `GLIBC_2.34' not found (required by /temp/workitems/Microsoft.ML.TorchSharp.Tests/runtimes/linux-x64/native/libLibTorchSharp.so) (fatal)
       260:     find library=libLibTorchSharp.so [0]; searching
       260:      search path=/usr/local/lib:/usr/local/lib64            (LD_LIBRARY_PATH)
       260:       trying file=/usr/local/lib/libLibTorchSharp.so
       260:       trying file=/usr/local/lib64/libLibTorchSharp.so
       260:      search cache=/etc/ld.so.cache
       260:      search path=/lib64/tls:/lib64:/usr/lib64/tls:/usr/lib64                (system search path)
       260:       trying file=/lib64/tls/libLibTorchSharp.so
       260:       trying file=/lib64/libLibTorchSharp.so
       260:       trying file=/usr/lib64/tls/libLibTorchSharp.so
       260:       trying file=/usr/lib64/libLibTorchSharp.so
       260:
       260:     find library=LibTorchSharp [0]; searching
       260:      search path=/usr/local/lib:/usr/local/lib64            (LD_LIBRARY_PATH)
       260:       trying file=/usr/local/lib/LibTorchSharp
       260:       trying file=/usr/local/lib64/LibTorchSharp
       260:      search cache=/etc/ld.so.cache
       260:      search path=/lib64/tls:/lib64:/usr/lib64/tls:/usr/lib64                (system search path)
       260:       trying file=/lib64/tls/LibTorchSharp
       260:       trying file=/lib64/LibTorchSharp
       260:       trying file=/usr/lib64/tls/LibTorchSharp
       260:       trying file=/usr/lib64/LibTorchSharp
       260:
       260:     find library=libLibTorchSharp [0]; searching
       260:      search path=/usr/local/lib:/usr/local/lib64            (LD_LIBRARY_PATH)
       260:       trying file=/usr/local/lib/libLibTorchSharp
       260:       trying file=/usr/local/lib64/libLibTorchSharp
       260:      search cache=/etc/ld.so.cache
       260:      search path=/lib64/tls:/lib64:/usr/lib64/tls:/usr/lib64                (system search path)
       260:       trying file=/lib64/tls/libLibTorchSharp
       260:       trying file=/lib64/libLibTorchSharp
       260:       trying file=/usr/lib64/tls/libLibTorchSharp
       260:       trying file=/usr/lib64/libLibTorchSharp

To Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior A clear and concise description of what you expected to happen.

Screenshots, Code, Sample Projects If applicable, add screenshots, code snippets, or sample projects to help explain your problem.

Additional context Add any other context about the problem here.

LittleLittleCloud commented 4 days ago

The glibc dependency change from torchsharp might be introduced in this PR, which upgrades its building image from ubuntu 18 to ubuntu 22. I'm not familiar with C++ so it's just my guess.

ericstj commented 4 days ago

We need to make sure we have an answer for this for ML.NET 4.0. Ideally we can update ML.NET to the latest torchsharp and that won't drop support for our platforms.