SciSharp / TensorFlow.NET

.NET Standard bindings for Google's TensorFlow for developing, training and deploying Machine Learning models in C# and F#.
https://scisharp.github.io/tensorflow-net-docs
Apache License 2.0
3.18k stars 510 forks source link

[Question]: Non-AVX CPU #1080

Open GadgetNutt opened 1 year ago

GadgetNutt commented 1 year ago

Description

Similar to #539, I have a non-AVX AMD A6-3650 APU that doesn't support AVX. The machine it is in is fantastic, even though it is at least 11 years old and performs exceedingly well, with presently 8GB RAM (soon to be 16GB and getting a processor upgrade soon to AMD Athlon II X4 651K 3 GHz Quad-Core). I have a RTX 3070 in this machine to use the GPU for TensorFlow I was going to use this as my development computer.

However, I keep getting errors when trying to run my net using TensorFlow.NET. I'm new at TF, so maybe I've got something wrong, but a basic console program works.

using System; using Tensorflow; using static Tensorflow.Binding;

namespace TensorFlowTest { class Program { static void Main(string[] args) { // The following code will use GPU if it's available, // because TensorFlow.NET automatically uses GPU if it's available.

        // Initialize TensorFlow
        tf.compat.v1.disable_eager_execution();

        // Create a new graph
        var graph = tf.Graph().as_default();
        var sess = tf.Session(graph);

        try
        {
            // Create a constant tensor
            var tensor = tf.constant(10.0f);

            // Execute the tensor and print the result
            var result = sess.run(tensor);
            Console.WriteLine($"The result is: {result}");
        }
        finally
        {
            //sess.Dispose;
        }

        Console.ReadKey();
    }
}

}

2023-05-19 10:20:08.015490: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5511 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3070, pci bus id: 0000:01:00.0, compute capability: 8.6

I was wondering if the TF.NET is requiring AVX or if you know of anything reason why I get errors on the machine but not my Surface Pro 7. My main program is in VB.NET btw.

Alternatives

No response

GadgetNutt commented 1 year ago

I just discovered if I set tf.compat.v1.disable_eager_execution()

in my constructor that it now runs on my ancient APU.

AsakusaRinne commented 1 year ago

Hi, since I don't have a non-avx cpu, could you please give a more detailed description? If I was not mistaken, when you run the console app, it will use cpu instead of gpu?

GadgetNutt commented 1 year ago

Sorry, here's the story.

I have a program I'm writing in VB.NET. Works fine on a Surface Pro 7. I tried to run it on this AMD A6 machine and it kept crashing when initializing the model. I think it used to work before I installed the RTX 3070, but it just crashes now. I tried a simple code (as above) to just see if it would initialize and use the GPU which it did. Then I noticed that the sample code (GPT generated) uses the line tf.compat.v1.disable_eager_execution which my VB.NET code did not have. So I tried adding that same line to the constructor of my VB.NET app and now my app is working on the AMD A6 machine, apparently with the GPU.

So now I guess it's a new question: Why will eager mode not work on my old AMD A6 machine? Is it because of it not have AVX or is it something else? Will disabling eager execution impair performance in any way?

AsakusaRinne commented 1 year ago

Thanks for your feedback. Is the system 64 bit?

GadgetNutt commented 1 year ago

Yes, both systems are 64 bit. The Surface Pro 7 is Windows 11 Pro 64 bit and the AMD A6 is running Windows 10 Pro 64 bit. Despite being at least 11 years old, the AMD A6 is mostly outperforming the Surface Pro 7.

AsakusaRinne commented 1 year ago

Will disabling eager execution impair performance in any way?

It could impair it in some conditions but just take it easy. According to some tensorflow documents I read before, the eager mode and graph mode have closed performance. Besides, sometimes eager mode is better than graph mode but sometimes it's exactly the opposite.

Why will eager mode not work on my old AMD A6 machine? Is it because of it not have AVX or is it something else?

Since the program crashed, I'm not sure about it. However I could try to help you to find the problem. Could you please clone the repo and change the reference of tensorflow.net from nuget package to TensorFlow.Core/Tensorflow.Binding.csproj? Maybe it will give you some information when you debug it.

GadgetNutt commented 1 year ago

Ok. I've had some interesting things happen.

  1. Sometimes it works. However, the last time I tried it, my app didn't work with the GPU the first time after I rebooted the computer. I actually had to start the C# test app first, which I guess activated the GPU connection somehow and then my app was able to use it.
  2. After rebooting, the app didn't crash. I tried various versions of the TensorFlow.NET package (including the csprog) and it worked fine.
  3. This morning I came back and started trying it out again. It is crashing again. Here is where it is crashing at: Graph.cs line 39 _handle = c_api.TF_NewGraph();

The error is: The program '[18664] myprog.exe' has exited with code 3221225501 (0xc000001d) 'Illegal Instruction'. I have a suspicion that it may have something to do with memory allocation in the GPU?

As it is at present, it is still crashing, even after a reboot. I was wrong about: tf.compat.v1.disable_eager_execution() That doesn't fix it. It is placed in code after Model is supposed to be initialized.

Still investigating... might have found somthing I did wrong.

AsakusaRinne commented 1 year ago

Sorry for the late reply. I've searched the issues of tensorflow but found few related thing. However, here's some tips that may help, I guess.

At first, please check the cuda version by nvcc -V and ensure it's cuda11.x ( >=11.2 for best). Then please ensure you have installed cudnn 8 (by checking C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\include\cuda_version.h).

After that, please ensure only one redist package is installed for the project.

Then have a try with eager mode, for example using the code below:

var a = tf.ones((2, 3));
var b = tf.ones((3, 4));
Console.WriteLine(a * b);

If it still keep crashing, please try with Tensorflow.Redist to run on CPU only to see if the problem is related to gpu settings (I sus it may not be AMD's problem).

Finally, please install Tensorflow.NET <= 0.100.2 and Tensorflow.Redist.Windows-Gpu == 2.10.0 and have a try. The redist package of windows after v2.10.1 are built from source by ourselves and it may cause the reason (though I don't know how it could cause the problem).

AsakusaRinne commented 1 year ago

hi, is this error still triggered?