Image classification CPU vs GPU accuracy

FaithlessDbo commented 1 year ago

OS:

Windows 11
Visual Studio 2022

Target framework version:

.NET 6.0

Nuget packages:

cuDNN v7.6.0
CUDA 10.1

What I'm trying to achieve TL DR; classifying images with my GPU instead of CPU.

I'm trying to classify images based on 14 different categories. Currently I have about 40.000 images, but I'm planning to add more to try and get a better dataset. Initially I have trained my dataset with my CPU and had pretty decent accuracy (90-98% in most cases), but the training and prediction speed was rather slow. I saw some articles about a GPU improving this speed. I bought a GPU for this, but the results were rather unexpected.

What did I do? In my "Environment" tab from the model builder I selected the "Local (GPU)" box and I installed the required extensions and the checks became green. I uninstalled the nuget package I used for the CPU training (SciSharp.TensorFlow.Redist) and installed the one required for the GPU (SciSharp.TensorFlow.Redist-Windows-GPU). When I hit the Train button in the "Train" tab I was amazed by the speed. It flew through the bottleneck computation, indicating the GPU is working (confirmed with GPU usage in my task manager). However my best MicroAccuracy dropped from ~0.93 to ~0.43 and I get about 8% accuracy in my evaluate tab, which is completely unexpected.

Model builder config:

{
"Scenario": "ImageClassification",
"DataSource": {
"Type": "Folder",
"Version": 1,
"FolderPath": "path\\To\\Images\\Folder"
},
"Environment": {
"Type": "LocalGPU",
"Version": 1
},
"Type": "TrainingConfig",
"Version": 3,
"TrainingOption": {
"Version": 0,
"Type": "ClassificationTrainingOption",
"TrainingTime": 2147483647,
"Seed": 0
}
}

What could be causing the low accuracy between my CPU and GPU settings?

Do I require more training images, or did I overlook something else? I'm looking forward for any help or suggestions! Thank you for reading!

luisquintanilla commented 1 year ago

Hi @FaithlessDbo,

Thanks for raising this issue. We'll look into it.

@v-Hailishi can you please try and repro this issue on your end. Thanks.

v-Hailishi commented 1 year ago

@luisquintanilla Sorry about I cannot validate this issue, because I have no GPU devices to test.

FaithlessDbo commented 1 year ago

@luisquintanilla @v-Hailishi Would there be another way to figure out what's going on here? It'd be a shame if I'd purchase an expensive card for this purpose with these kind of results.

Is there some kind of logging I could do on my end which could help to shed more light on the issue at hand?

luisquintanilla commented 1 year ago

Thanks @v-Hailishi.

@JakeRadMSFT @LittleLittleCloud is this something you can try and repro on your end.

FaithlessDbo commented 1 year ago

Is there anything else I could provide to keep this going? I'd hate to have invested into an Nvidea GPU without being able to use it for it's soul purpose.

luisquintanilla commented 1 year ago

@LittleLittleCloud can you please take a look at this one. Thanks.

LittleLittleCloud commented 1 year ago

I don't have the answer right now, The cause of the difference in training performance between CPU and GPU can be various: a bug in cuda/cudnn, different accuracy in float operation, different initialize seeds... It's difficult to say which exactly could be the possible reason that cause this training performance difference.

It would be helpful if you can provide with us your dataset for reproduction, otherwise, we can try reproducing it on other open-source dataset as well but that might cause more time.

In the meanwhile, the work-around you can try is

training using larger epoch in GPU training. It requires you to use ML.Net framework as ModelBuilder doesn't allow you to customize epoch number. It would be a lovely feature for deep-learning scenario though @luisquintanilla training using cpu and inferencing using gpu. The model should use the same weight for cpu/gpu training and you can still use gpu for inferencing even if model is trained on cpu

FaithlessDbo commented 1 year ago

@LittleLittleCloud I have tried to use custom training using more epochs, but the results were the same. Training with the CPU and then consuming it with the GPU also gave me the same bad results.

I could provide a training set. Would 100 images of each category suffice, or do you require more images? Also since this isn't a public dataset perhaps it's best if I would send you the dataset in private somewhere?

mg-yolo-enterprises commented 1 year ago

@LittleLittleCloud Thanks for the attention on this issue!

In response to your workaround suggestion to consume a CPU-trained model with the GPU:

I concur with @FaithlessDbo that training the model with CPU, then consuming the model with GPU appears to work, but in reality it does not. See details below.

Here's an example of consuming a CPU-trained model on the CPU using ten images - 5 images from each of the two classes:

These are great results, but we really need the speed boost from GPU acceleration to be able to use this model.

Everything looks good as the GPU gets started:

Now here's the same ten images, using the same CPU-trained model run on the GPU:

Here's the blip of GPU usage labeled "Cuda" shown in Task Manager during this time, so we can be sure something's actually happening:

In the above case, the only change made was swapping the SciSharp.TensorFlow 2.3.1 Nuget package to enable use of the GPU.

Some notes:

With the GPU, no errors are encountered that indicate things aren't working.
With the GPU, the same class receives the same score each time, despite the prediction running on ten different images.
The GPU score 85% is very different from the CPU results, which accurately predicted all images at >99%.

Please let me know if I can provide further information to diagnose this issue.

Since @FaithlessDbo has offered to provide training data, I'll await the results of that testing. Let me know if you need me to provide you with additional training data. My complete dataset is 50GB of PNGs, but I can send you part of that if needed.

LittleLittleCloud commented 1 year ago

Thanks @FaithlessDbo and @mg-yolo-enterprises for the response. I can start with cifar10 first to see if I can reproduce this issue.

Also @Oceania2018 does this issue look similar to you?

FaithlessDbo commented 1 year ago

@LittleLittleCloud so for now I don't have to provide a dataset? Let me know if you would require anything and if so how you would like to receive the dataset.

LittleLittleCloud commented 1 year ago

@FaithlessDbo So far no need extra dataset. And could you do me a favor and run this cifar10 example on both CPU and GPU to see if there's significant performance difference in accuracy? It doesn't have significant difference on my end

FaithlessDbo commented 1 year ago

@LittleLittleCloud Thanks for your sample. I've ran both scenario's (I did have to alter the ML version to V2.0) and here are my results:

Short version:

CPU : 0.67
GPU : 0.10

Long version with console output:

CPU:

2023-03-08 20:50:41.511342: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-03-08 20:50:41.525995: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1ca26c544e0 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2023-03-08 20:50:41.526033: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version Saver not created because there are no variables in the graph to restore Saver not created because there are no variables in the graph to restore Restoring parameters from C:\Users\User\AppData\Local\Temp\jhc051rn.eke\custom_retrained_model_based_on_resnet_v2_50_299.meta Froze 2 variables. Converted 2 variables to const ops. Accuracy: 0,6744833563409415

GPU:

2023-03-08 20:56:09.438291: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll 2023-03-08 20:56:09.467462: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-03-08 20:56:09.483618: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x256b0f28aa0 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2023-03-08 20:56:09.483660: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2023-03-08 20:56:09.484713: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll 2023-03-08 20:56:09.499711: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3080 computeCapability: 8.6 coreClock: 1.8GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s 2023-03-08 20:56:09.499769: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll 2023-03-08 20:56:09.504843: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll 2023-03-08 20:56:09.507603: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll 2023-03-08 20:56:09.508574: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll 2023-03-08 20:56:09.511939: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll 2023-03-08 20:56:09.513946: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll 2023-03-08 20:56:09.519000: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll 2023-03-08 20:56:09.519082: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0 2023-03-08 20:56:10.161452: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix: 2023-03-08 20:56:10.161524: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0 2023-03-08 20:56:10.161568: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N 2023-03-08 20:56:10.161706: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8428 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3080, pci bus id: 0000:01:00.0, compute capability: 8.6) 2023-03-08 20:56:10.163992: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x256e0a5a000 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: 2023-03-08 20:56:10.164038: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA GeForce RTX 3080, Compute Capability 8.6 Saver not created because there are no variables in the graph to restore 2023-03-08 20:56:10.957480: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3080 computeCapability: 8.6 coreClock: 1.8GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s 2023-03-08 20:56:10.957523: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll 2023-03-08 20:56:10.957573: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll 2023-03-08 20:56:10.957612: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll 2023-03-08 20:56:10.957643: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll 2023-03-08 20:56:10.957674: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll 2023-03-08 20:56:10.957705: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll 2023-03-08 20:56:10.957732: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll 2023-03-08 20:56:10.957771: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0 2023-03-08 20:56:10.957815: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix: 2023-03-08 20:56:10.957840: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0 2023-03-08 20:56:10.957863: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N 2023-03-08 20:56:10.957926: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8428 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3080, pci bus id: 0000:01:00.0, compute capability: 8.6) 2023-03-08 20:56:13.081218: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll 2023-03-08 20:56:16.128114: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only Relying on driver to perform ptx compilation. Modify $PATH to customize ptxas location. This message will be only logged once. 2023-03-08 20:56:16.136872: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll Saver not created because there are no variables in the graph to restore 2023-03-08 20:56:24.781837: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3080 computeCapability: 8.6 coreClock: 1.8GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s 2023-03-08 20:56:24.781919: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll 2023-03-08 20:56:24.781953: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll 2023-03-08 20:56:24.781991: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll 2023-03-08 20:56:24.782024: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll 2023-03-08 20:56:24.782051: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll 2023-03-08 20:56:24.782080: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll 2023-03-08 20:56:24.782106: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll 2023-03-08 20:56:24.782148: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0 2023-03-08 20:56:24.782208: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix: 2023-03-08 20:56:24.782242: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0 2023-03-08 20:56:24.782266: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N 2023-03-08 20:56:24.782355: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8428 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3080, pci bus id: 0000:01:00.0, compute capability: 8.6) Restoring parameters from C:\Users\User\AppData\Local\Temp\xprhiipd.vug\custom_retrained_model_based_on_resnet_v2_50_299.meta Froze 2 variables. Converted 2 variables to const ops. 2023-03-08 20:56:25.244378: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3080 computeCapability: 8.6 coreClock: 1.8GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s 2023-03-08 20:56:25.244481: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll 2023-03-08 20:56:25.244521: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll 2023-03-08 20:56:25.244559: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll 2023-03-08 20:56:25.244585: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll 2023-03-08 20:56:25.244612: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll 2023-03-08 20:56:25.244642: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll 2023-03-08 20:56:25.244673: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll 2023-03-08 20:56:25.244718: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0 2023-03-08 20:56:25.244770: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix: 2023-03-08 20:56:25.244797: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0 2023-03-08 20:56:25.244822: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N 2023-03-08 20:56:25.244916: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8428 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3080, pci bus id: 0000:01:00.0, compute capability: 8.6) Accuracy: 0,10825396825396824

LittleLittleCloud commented 1 year ago

Thanks for the quick run and that's interesting to see. It acheives both 0.6744833563409415 accuracy on my computer and that's wield!!

I wonder could it be because of GPU? I see both you and @mg-yolo-enterprises has an RTX card and could that be the reason? Which version of GPU driver do you have

FaithlessDbo commented 1 year ago

My GPU driver version is 528.49

mg-yolo-enterprises commented 1 year ago

Sorry for getting to this late in the day! @LittleLittleCloud thanks for providing the sample program. Here are the results from my computer (same specs provided in #2517 ):

CPU: 0.67 GPU: 0.1

LittleLittleCloud commented 1 year ago

@FaithlessDbo @mg-yolo-enterprises Thanks for running that project and send the CPU vs GPU metric. @michaelgsharp will also help us verify the performance of CPU vs GPU training on his RTX && GTX card in order to strengthen the conclusion that the low accuracy only happens on RTX series based on observation.

If the conclusion stands then we probably won't fix this issue. Because 1)this issue relates to Tensorflow or GPU driver, not ML.Net or ModelBuilder and 2) we are working on leveraging TorchSharp to train the image classification model which will probably come out in next or next next release. And this issue is likely to not exist after switching to TorchSharp

Oceania2018 commented 1 year ago

Sorry for getting to this late in the day! @LittleLittleCloud thanks for providing the sample program. Here are the results from my computer (same specs provided in #2517 ):

CPU: 0.67 GPU: 0.1

@mg-yolo-enterprises Is it possible to run CPU and GPU based on this example?

mg-yolo-enterprises commented 1 year ago

@Oceania2018 thanks so much for your assistance. I ran the example you linked to, both in Release config (CPU) and GPU config. Both completed normally, and with the same overall results. As expected, the GPU config completed much more quickly. In between runs, I deleted the "image_classification_v1" folder generated in the solution folder to ensure I was performing a fresh run.

Final Test Accuracy:

Release Config (CPU):
- 84.71%
GPU Config:
- 84.63%

Screenshots below:

Oceania2018 commented 1 year ago

@mg-yolo-enterprises Thanks for your quick response. @LittleLittleCloud The test means there is probably something wrong in the integration with ML.NET.

LittleLittleCloud commented 1 year ago

@Oceania2018 We are still using TF 2.3.1, would upgrading to 2.5.0 resolve this issue?

Oceania2018 commented 1 year ago

@Oceania2018 We are still using TF 2.3.1, would upgrading to 2.5.0 resolve this issue?

Need have someone to test. I don't have the answer at this moment. Why not just upgrade to latest version v0.100.4? ML.NET is supposed to support TensorFlow and PyTorch both if I understand correctly.

joao-ladeira commented 1 year ago

I'm having the exact same issue. GPU:

CPU:

If i train with my CPU everything is as expected, if i use my GPU (RTX3090) my accuracy goes down the drain...

Is a fix expected soon? Or we need to wait for TorchSharp?

My dataset is this one: https://www.kaggle.com/datasets/alxmamaev/flowers-recognition

michaelgsharp commented 1 year ago

@LittleLittleCloud @Oceania2018 we do support both TensorFlow and TorchSharp.

Upgrading TensorFlow to the latest version will require some work. There were a fair amount of breaking changes introduced, and it seems memory handling has changed as well as after I fixed the breaking changes so it would build it would fail when accessing memory that was freed already.

Oceania2018 commented 1 year ago

@michaelgsharp Anything blocks your progress to upgrade to latest TensorFlow, please ping either of @Oceania2018 @AsakusaRinne, The new vesion will bring more TensorFlow power.

FaithlessDbo commented 1 year ago

@Oceania2018 I've tried running the examples in your post. I'm getting 44/45 marked as completed, however the TransferLearningWithInceptionV3 fails.

I've selected this test in particular and ran it again with a clean slate. However it's not passing. There is an index out of bound, however I can't really explain what's going on here.

Microsoft Windows NT 10.0.22621.0
64Bit Operating System: True
.NET CLR: 6.0.16
TensorFlow Binary v2.10.0
TensorFlow.NET v0.100.1.0
TensorFlow.Keras v0.10.1.0
M:\Program\SciSharp-Stack-Examples\src\TensorFlowNET.Examples\bin\GPU\net6.0
24-5-2023 18:40:33 Starting Transfer Learning With InceptionV3 (Graph)
Downloading from http://download.tensorflow.org/example_images/flower_photos.tgz
......................
Downloaded to image_classification_v1\flower_photos.tgz
Extracting.
..........
Extracting is completed.
Downloading from https://raw.githubusercontent.com/SciSharp/TensorFlow.NET/master/graph/InceptionV3.meta
.
Downloaded to graph\InceptionV3.meta
Downloading from https://github.com/SciSharp/TensorFlow.NET/raw/master/data/tfhub_modules.zip
...........
Downloaded to M:\Program\SciSharp-Stack-Examples\src\TensorFlowNET.Examples\bin\GPU\net6.0\image_classification_v1\tfhub_modules.zip
Extracting.
..
Extracting is completed.
Looking for images in 'daisy'
Looking for images in 'dandelion'
Looking for images in 'roses'
Looking for images in 'sunflowers'
Looking for images in 'tulips'
2023-05-24 20:41:11.179005: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-24 20:41:11.560223: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 7427 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3080, pci bus id: 0000:01:00.0, compute capability: 8.6
2023-05-24 20:41:11.647402: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
Creating bottleneck at M:\Program\SciSharp-Stack-Examples\src\TensorFlowNET.Examples\bin\GPU\net6.0\image_classification_v1\bottleneck\daisy\20580471306_ab5a011b15_n.jpg_https~tfhub.dev~google~imagenet~inception_v3~feature_vector~3.txt
2023-05-24 20:41:13.535957: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8600
2023-05-24 20:41:14.171833: I tensorflow/stream_executor/cuda/cuda_blas.cc:1614] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2023-05-24 20:41:14.268218: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 7427 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3080, pci bus id: 0000:01:00.0, compute capability: 8.6

Creating bottleneck .... Many bottlenecks here

300 bottleneck files created.

Creating bottleneck .... Many bottlenecks here

Creating bottleneck at M:\Program\SciSharp-Stack-Examples\src\TensorFlowNET.Examples\bin\GPU\net6.0\image_classification_v1\bottleneck\tulips\15275199229_962387f24d.jpg_https~tfhub.dev~google~imagenet~inception_v3~feature_vector~3.txt

System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at SciSharp.Models.ImageClassification.TransferLearning.get_random_cached_bottlenecks(Session sess, Dictionary`2 image_lists, Int32 how_many, String category, String bottleneck_dir, Tensor jpeg_data_tensor, Tensor decoded_image_tensor, Tensor resized_input_tensor, Tensor bottleneck_tensor, String module_name)
   at SciSharp.Models.ImageClassification.TransferLearning.Train(TrainingOptions options)
   at TensorFlowNET.Examples.TransferLearningWithInceptionV3.Train() in M:\Program\SciSharp-Stack-Examples\src\TensorFlowNET.Examples\ImageProcessing\TransferLearningWithInceptionV3.cs:line 71
   at TensorFlowNET.Examples.TransferLearningWithInceptionV3.Run() in M:\Program\SciSharp-Stack-Examples\src\TensorFlowNET.Examples\ImageProcessing\TransferLearningWithInceptionV3.cs:line 46
   at TensorFlowNET.Examples.Program.RunExamples(Type example, Dictionary`2 args) in M:\Program\SciSharp-Stack-Examples\src\TensorFlowNET.Examples\Program.cs:line 119
24-5-2023 18:41:48 Completed Transfer Learning With InceptionV3 (Graph)
2023-05-24 20:41:48.739141: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 7427 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3080, pci bus id: 0000:01:00.0, compute capability: 8.6
Example: Transfer Learning With InceptionV3 (Graph) in 75,1697955s is Failed!
Microsoft Windows NT 10.0.22621.0
64Bit Operating System: True
.NET CLR: 6.0.16
TensorFlow Binary v2.10.0
TensorFlow.NET v0.100.1.0
TensorFlow.Keras v0.10.1.0
0 of 1 example(s) are completed.

Do you have any suggestions on how I could complete this task with success? Thanks in advance :)

AsakusaRinne commented 1 year ago

@FaithlessDbo Could you please try with the latest version (v0.100.5)?

FaithlessDbo commented 1 year ago

@AsakusaRinne Thanks for bringing this update to my attention, however I'm afraid the same issue persists:

Microsoft Windows NT 10.0.22621.0
64Bit Operating System: True
.NET CLR: 6.0.16
TensorFlow Binary v2.10.1
TensorFlow.NET v0.100.5.0
TensorFlow.Keras v0.10.5.0

System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at SciSharp.Models.ImageClassification.TransferLearning.get_random_cached_bottlenecks(Session sess, Dictionary`2 image_lists, Int32 how_many, String category, String bottleneck_dir, Tensor jpeg_data_tensor, Tensor decoded_image_tensor, Tensor resized_input_tensor, Tensor bottleneck_tensor, String module_name)

AsakusaRinne commented 1 year ago

@FaithlessDbo I tried it however didn't get an error. Are you using exactly the latest code from SciSharp-Stack-Examples without modification?

FaithlessDbo commented 1 year ago

@AsakusaRinne My steps are as following:

1) Download the code as zip 2) Extract the zip file 3) Open SciSharp STACK Examples.sln 4) Set the active solution configuration to "GPU" 5) Hit the run button 6) Wait for the tasks to complete

Example: Basic Eager in 3,9086364s is OK!
Example: K-means Clustering in 0,0004369s is OK!
Example: Linear Regression (Graph) in 5,9045725s is OK!
Example: Linear Regression (Eager) in 1,6489888s is OK!
Example: Linear Regression (Keras) in 2,6311807s is OK!
Example: Logistic Regression (Graph) in 9,16897s is OK!
Example: Logistic Regression (Eager) in 3,8734343s is OK!
Example: Logistic Regression (Keras) in 36,8593307s is OK!
Example: Naive Bayes Classifier in 1,4774268s is OK!
Example: Nearest Neighbor in 2,4854777s is OK!
Example: Basic Operations in 0,0055698s is OK!
Example: Convert TensorFlow Model to OpenCv in 0,0002946s is OK!
Example: GAN MNIST in 111,7086675s is OK!
Example: Hello World in 0,3237904s is OK!
Example: MNIST LSTM (Graph) in 0,000339s is OK!
Example: Digits Recognition Neural Network in 0,000553s is OK!
Example: MNIST RNN (Graph) in 0,0003953s is OK!
Example: MNIST RNN (Keras) in 0,0008375s is OK!
Example: Image Background Removal in 0,0004457s is OK!
Example: Image Recognition Inception in 43,1262051s is OK!
Example: Inception Arch GoogLeNet in 0,0004193s is OK!
Example: MNIST FNN (Keras Functional) in 11,7204182s is OK!
Example: Text Sentiment Classification in 0,007651s is OK!
Example: Predict fuel efficiency in 29,3273245s is OK!
Example: Fully Connected Neural Network (Graph) in 0,0004441s is OK!
Example: Fully Connected Neural Network (Eager) in 3,7970138s is OK!
Example: Fully Connected Neural Network In Queue in 0,0004443s is OK!
Example: Fully Connected Neural Network (Keras) in 2,6093542s is OK!
Example: NN XOR in Graph Mode in 4,6189418s is OK!
Example: NN XOR in Eager Mode in 13,6401218s is OK!
Example: MNIST in YOLOv3 in 0,0006342s is OK!
Example: YoloCoco in 0,0002949s is OK!
Example: Binary Text Classification in 0,0003298s is OK!
Example: CNN Text Classification (Keras) in 0,0003977s is OK!
Example: NER in 0,00033s is OK!
Example: Classify text with BERT in 0,0002875s is OK!
Example: Word2Vec in 99,7844949s is OK!
Example: Weather Prediction in 39,3517703s is OK!
Example: CNN in Your Own Data (Graph) in 22,2386955s is Failed!
Example: MNIST CNN (Graph) in 1,8552623s is Failed!
Example: MNIST CNN (Eager) in 0,3584103s is Failed!
Example: Image Classification (Keras) in 0,8593132s is Failed!
Example: MNIST CNN (Keras Subclass) in 0,361694s is Failed!
Example: Toy ResNet in 1,0943189s is Failed!
Example: Transfer Learning With InceptionV3 (Graph) in 174,4175218s is Failed!
Example: NN XOR in Keras in 0,6046018s is Failed!
Example: Object Detection in MobileNet (Graph) in 16,3824418s is Failed!
Example: CNN Text Classification (Graph) in 2,6681871s is Failed!

7) Notice I have errors thrown: "Tensorflow.TensorflowException: DNN library is not found." 8) Update the tensorflow GPU package to 2.10.2 (newest package at this moment) 9) Run again and now only these 2 fail:

Example: Transfer Learning With InceptionV3 (Graph) in 39,6556827s is Failed!
Example: NN XOR in Keras in 0,7565231s is Failed!

10) Only load the TransferLearningWithInceptionV3 by changing the examples in the program

var examples = Assembly.GetEntryAssembly().GetTypes()
    .Where(x => x.GetInterfaces().Contains(typeof(IExample)))
    .Where(x => x.Name == nameof(TransferLearningWithInceptionV3))
    .ToArray();

11) Run again with the following result:

Microsoft Windows NT 10.0.22621.0
64Bit Operating System: True
.NET CLR: 6.0.16
TensorFlow Binary v2.10.1
TensorFlow.NET v0.100.5.0
TensorFlow.Keras v0.10.5.0
C:\Users\User\Downloads\SciSharp-Stack-Examples-master\src\TensorFlowNET.Examples\bin\GPU\net6.0
25-5-2023 20:31:45 Starting Transfer Learning With InceptionV3 (Graph)
image_classification_v1\flower_photos.tgz already exists.
graph\InceptionV3.meta already exists.
C:\Users\User\Downloads\SciSharp-Stack-Examples-master\src\TensorFlowNET.Examples\bin\GPU\net6.0\image_classification_v1\tfhub_modules.zip already exists.
Looking for images in 'daisy'
Looking for images in 'dandelion'
Looking for images in 'roses'
Looking for images in 'sunflowers'
Looking for images in 'tulips'
2023-05-25 22:31:45.274319: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-25 22:31:46.351201: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 7427 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3080, pci bus id: 0000:01:00.0, compute capability: 8.6
2023-05-25 22:31:46.399714: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 7427 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3080, pci bus id: 0000:01:00.0, compute capability: 8.6
2023-05-25 22:31:47.527705: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 7427 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3080, pci bus id: 0000:01:00.0, compute capability: 8.6
2023-05-25 22:31:47.672008: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
300 bottleneck files created.
600 bottleneck files created.
900 bottleneck files created.
1200 bottleneck files created.
1500 bottleneck files created.
1800 bottleneck files created.
2100 bottleneck files created.
2400 bottleneck files created.
2700 bottleneck files created.
3000 bottleneck files created.
3300 bottleneck files created.
3600 bottleneck files created.
System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at SciSharp.Models.ImageClassification.TransferLearning.get_random_cached_bottlenecks(Session sess, Dictionary`2 image_lists, Int32 how_many, String category, String bottleneck_dir, Tensor jpeg_data_tensor, Tensor decoded_image_tensor, Tensor resized_input_tensor, Tensor bottleneck_tensor, String module_name)
   at SciSharp.Models.ImageClassification.TransferLearning.Train(TrainingOptions options)
   at TensorFlowNET.Examples.TransferLearningWithInceptionV3.Train() in C:\Users\User\Downloads\SciSharp-Stack-Examples-master\src\TensorFlowNET.Examples\ImageProcessing\TransferLearningWithInceptionV3.cs:line 71
   at TensorFlowNET.Examples.TransferLearningWithInceptionV3.Run() in C:\Users\User\Downloads\SciSharp-Stack-Examples-master\src\TensorFlowNET.Examples\ImageProcessing\TransferLearningWithInceptionV3.cs:line 46
   at TensorFlowNET.Examples.Program.RunExamples(Type example, Dictionary`2 args) in C:\Users\User\Downloads\SciSharp-Stack-Examples-master\src\TensorFlowNET.Examples\Program.cs:line 117
25-5-2023 20:31:53 Completed Transfer Learning With InceptionV3 (Graph)
2023-05-25 22:31:53.350445: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 7427 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3080, pci bus id: 0000:01:00.0, compute capability: 8.6
Example: Transfer Learning With InceptionV3 (Graph) in 8,1532839s is Failed!
Microsoft Windows NT 10.0.22621.0
64Bit Operating System: True
.NET CLR: 6.0.16
TensorFlow Binary v2.10.1
TensorFlow.NET v0.100.5.0
TensorFlow.Keras v0.10.5.0
0 of 1 example(s) are completed.

AsakusaRinne commented 1 year ago

@Oceania2018 Could you please help with this problem? The device with cuda11 is not available to me until 5.28.

Oceania2018 commented 1 year ago

@FaithlessDbo @AsakusaRinne Everything looks good in my side.

Microsoft Windows NT 10.0.22621.0
64Bit Operating System: True
.NET CLR: 6.0.16
TensorFlow Binary v2.10.1
TensorFlow.NET v0.100.5.0
TensorFlow.Keras v0.10.5.0
C:\Users\haipi\Source\repos\SciSharp\SciSharp-Stack-Examples\src\TensorFlowNET.Examples\bin\GPU\net6.0
5/29/2023 1:24:45 AM Starting Transfer Learning With InceptionV3 (Graph)
image_classification_v1\flower_photos.tgz already exists.
graph\InceptionV3.meta already exists.
C:\Users\haipi\Source\repos\SciSharp\SciSharp-Stack-Examples\src\TensorFlowNET.Examples\bin\GPU\net6.0\image_classification_v1\tfhub_modules.zip already exists.
Looking for images in 'daisy'
Looking for images in 'dandelion'
Looking for images in 'roses'
Looking for images in 'sunflowers'
Looking for images in 'tulips'
2023-05-28 20:24:45.540341: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-28 20:24:48.117868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3475 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
2023-05-28 20:24:48.161028: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3475 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
2023-05-28 20:24:49.330087: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3475 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
2023-05-28 20:24:49.411977: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
300 bottleneck files created.
600 bottleneck files created.
900 bottleneck files created.
1200 bottleneck files created.
1500 bottleneck files created.
1800 bottleneck files created.
2100 bottleneck files created.
2400 bottleneck files created.
2700 bottleneck files created.
3000 bottleneck files created.
3300 bottleneck files created.
3600 bottleneck files created.
2023-05-28 20:25:25.627607: I tensorflow/stream_executor/cuda/cuda_blas.cc:1614] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
Step 0: Training accuracy = 49%, Cross entropy = 1.547, Validation accuracy = 49% (N=100) 5244ms
Step 10: Training accuracy = 79%, Cross entropy = 1.237, Validation accuracy = 77% (N=100) 1008ms
Step 20: Training accuracy = 77%, Cross entropy = 1.043, Validation accuracy = 76% (N=100) 1159ms
Step 30: Training accuracy = 79%, Cross entropy = 0.8928, Validation accuracy = 80% (N=100) 1297ms
Step 40: Training accuracy = 82%, Cross entropy = 0.8046, Validation accuracy = 80% (N=100) 1288ms
Step 50: Training accuracy = 85%, Cross entropy = 0.6853, Validation accuracy = 83% (N=100) 790ms
Step 60: Training accuracy = 86%, Cross entropy = 0.6046, Validation accuracy = 83% (N=100) 818ms
Step 70: Training accuracy = 79%, Cross entropy = 0.6953, Validation accuracy = 82% (N=100) 1057ms
Step 80: Training accuracy = 86%, Cross entropy = 0.5439, Validation accuracy = 88% (N=100) 889ms
Step 90: Training accuracy = 83%, Cross entropy = 0.5962, Validation accuracy = 82% (N=100) 750ms
Step 99: Training accuracy = 85%, Cross entropy = 0.5662, Validation accuracy = 77% (N=100) 853ms
Saving checkpoint to C:\Users\haipi\Source\repos\SciSharp\SciSharp-Stack-Examples\src\TensorFlowNET.Examples\bin\GPU\net6.0\image_classification_v1\checkpoint
Saving final result to: C:\Users\haipi\Source\repos\SciSharp\SciSharp-Stack-Examples\src\TensorFlowNET.Examples\bin\GPU\net6.0\image_classification_v1\saved_model.pb
2023-05-28 20:25:38.255891: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3475 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
Restoring parameters from C:\Users\haipi\Source\repos\SciSharp\SciSharp-Stack-Examples\src\TensorFlowNET.Examples\bin\GPU\net6.0\image_classification_v1\checkpoint
Froze 378 variables.
Converted 378 variables to const ops.
graph\InceptionV3.meta already exists.
C:\Users\haipi\Source\repos\SciSharp\SciSharp-Stack-Examples\src\TensorFlowNET.Examples\bin\GPU\net6.0\image_classification_v1\tfhub_modules.zip already exists.
Looking for images in 'daisy'
Looking for images in 'dandelion'
Looking for images in 'roses'
Looking for images in 'sunflowers'
Looking for images in 'tulips'
2023-05-28 20:25:41.285315: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3475 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
2023-05-28 20:25:45.150306: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3475 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
Restoring parameters from C:\Users\haipi\Source\repos\SciSharp\SciSharp-Stack-Examples\src\TensorFlowNET.Examples\bin\GPU\net6.0\image_classification_v1\checkpoint
final test accuracy: 83.54% (N=3670)
graph\InceptionV3.meta already exists.
C:\Users\haipi\Source\repos\SciSharp\SciSharp-Stack-Examples\src\TensorFlowNET.Examples\bin\GPU\net6.0\image_classification_v1\tfhub_modules.zip already exists.
2023-05-28 20:25:46.886005: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3475 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
2023-05-28 20:25:49.564429: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8500
Predicted result: daisy - 0.38%
5/29/2023 1:25:53 AM Completed Transfer Learning With InceptionV3 (Graph)
2023-05-28 20:25:53.149170: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3475 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
Example: Transfer Learning With InceptionV3 (Graph) in 67.8040987s is OK!
Microsoft Windows NT 10.0.22621.0
64Bit Operating System: True
.NET CLR: 6.0.16
TensorFlow Binary v2.10.1
TensorFlow.NET v0.100.5.0
TensorFlow.Keras v0.10.5.0
1 of 1 example(s) are completed.

FaithlessDbo commented 1 year ago

@Oceania2018 what Cuda and CuDNN versions are you using?

AsakusaRinne commented 1 year ago

@FaithlessDbo My configuration is Cuda11.2 + Cudnn 8.1 and it also works well.

FaithlessDbo commented 1 year ago

@AsakusaRinne Thanks, I tried to run with that exact same setup, however I still get

System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at SciSharp.Models.ImageClassification.TransferLearning.get_random_cached_bottlenecks(Session sess, Dictionary`2 image_lists, Int32 how_many, String category, String bottleneck_dir, Tensor jpeg_data_tensor, Tensor decoded_image_tensor, Tensor resized_input_tensor, Tensor bottleneck_tensor, String module_name)
   at SciSharp.Models.ImageClassification.TransferLearning.Train(TrainingOptions options)
   at TensorFlowNET.Examples.TransferLearningWithInceptionV3.Train() in C:\Users\User\Downloads\SciSharp-Stack-Examples-master\src\TensorFlowNET.Examples\ImageProcessing\TransferLearningWithInceptionV3.cs:line 71
   at TensorFlowNET.Examples.TransferLearningWithInceptionV3.Run() in C:\Users\User\Downloads\SciSharp-Stack-Examples-master\src\TensorFlowNET.Examples\ImageProcessing\TransferLearningWithInceptionV3.cs:line 46
   at TensorFlowNET.Examples.Program.RunExamples(Type example, Dictionary`2 args) in C:\Users\User\Downloads\SciSharp-Stack-Examples-master\src\TensorFlowNET.Examples\Program.cs:line 117

I also tried the model builder, but also the same results, so it's not the Cuda version. Do you have any suggestions on what I could try next to identify the issue?

AsakusaRinne commented 1 year ago

@FaithlessDbo The stack trace ended at SciSharp.Models.ImageClassification.TransferLearning.get_random_cached_bottlenecks, leaving it unclear which array has an invalid index. Could you please replace the package references of SciSharp.Models with the source code of SciSharp.Models? When you run it again, the more detailed stack trace will be printed.

Here're the references that needs to be replaced.

<PackageReference Include="SciSharp.Models.ImageClassification" Version="0.5.0" />
<PackageReference Include="SciSharp.Models.ObjectDetection" Version="0.2.0" />
<PackageReference Include="SciSharp.Models.TimeSeries" Version="0.3.0" />

FaithlessDbo commented 1 year ago

@AsakusaRinne Thanks again for the reply.

I think I've set it up correctly, the new refferences:

    <ItemGroup>
      <ProjectReference Include="..\..\..\New\SciSharp.Models-master\SciSharp.Models-master\SciSharp.Models.ImageClassification\SciSharp.Models.ImageClassification.csproj" />
      <ProjectReference Include="..\..\..\New\SciSharp.Models-master\SciSharp.Models-master\SciSharp.Models.ObjectDetection\SciSharp.Models.ObjectDetection.csproj" />
      <ProjectReference Include="..\..\..\New\SciSharp.Models-master\SciSharp.Models-master\SciSharp.Models.TimeSeries\SciSharp.Models.TimeSeries.csproj" />
    </ItemGroup>

Changed the refferences inside the project toward these newly added projects.

New output:

System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at SciSharp.Models.ImageClassification.TransferLearning.get_random_cached_bottlenecks(Session sess, Dictionary`2 image_lists, Int32 how_many, String category, String bottleneck_dir, Tensor jpeg_data_tensor, Tensor decoded_image_tensor, Tensor resized_input_tensor, Tensor bottleneck_tensor, String module_name) in C:\Users\User\Downloads\New\SciSharp.Models-master\SciSharp.Models-master\SciSharp.Models.ImageClassification\TransferLearning\TransferLearning.Bottleneck.cs:line 165
   at SciSharp.Models.ImageClassification.TransferLearning.Train(TrainingOptions options) in C:\Users\User\Downloads\New\SciSharp.Models-master\SciSharp.Models-master\SciSharp.Models.ImageClassification\TransferLearning\TransferLearning.Train.cs:line 81
   at TensorFlowNET.Examples.TransferLearningWithInceptionV3.Train() in C:\Users\User\Downloads\SciSharp-Stack-Examples-master\src\TensorFlowNET.Examples\ImageProcessing\TransferLearningWithInceptionV3.cs:line 71
   at TensorFlowNET.Examples.TransferLearningWithInceptionV3.Run() in C:\Users\User\Downloads\SciSharp-Stack-Examples-master\src\TensorFlowNET.Examples\ImageProcessing\TransferLearningWithInceptionV3.cs:line 46
   at TensorFlowNET.Examples.Program.RunExamples(Type example, Dictionary`2 args) in C:\Users\User\Downloads\SciSharp-Stack-Examples-master\src\TensorFlowNET.Examples\Program.cs:line 117

AsakusaRinne commented 1 year ago

System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at SciSharp.Models.ImageClassification.TransferLearning.get_random_cached_bottlenecks(Session sess, Dictionary`2 image_lists, Int32 how_many, String category, String bottleneck_dir, Tensor jpeg_data_tensor, Tensor decoded_image_tensor, Tensor resized_input_tensor, Tensor bottleneck_tensor, String module_name) in C:\Users\User\Downloads\New\SciSharp.Models-master\SciSharp.Models-master\SciSharp.Models.ImageClassification\TransferLearning\TransferLearning.Bottleneck.cs:line 165

Could you please print the shape of bottlenecks, bottleneck and the value of how_many here?

FaithlessDbo commented 1 year ago

bottlenecks: System.Single[,] - (100, 2048) bottleneck: System.Single[] - (4006,) how_many: 100 - ()

AsakusaRinne commented 1 year ago

bottlenecks: System.Single[,] - (100, 2048) bottleneck: System.Single[] - (4006,) how_many: 100 - ()

Hi, 2048 is the expected dot of image width and image height. Generally the shape of bottleneck should be equal to 2048. Are you using another dataset rather than the dataset specified in SciSharp-Stack-Examples?

FaithlessDbo commented 1 year ago

I'm using the flowers dataset that is downloaded within the project.

AsakusaRinne commented 1 year ago

Does a file named bin\Debug\net6.0\image_classification_v1\bottleneck\dandelion\4691257171_23a29aaa33_n.jpg_https~tfhub.dev~google~imagenet~inception_v3~feature_vector~3.txt existing in your folder?

AsakusaRinne commented 1 year ago

@FaithlessDbo Hi, we still can't reproduce the error you met. Would you like to have a zoom meeting with us to share your screen? That may help to find the problem.

FaithlessDbo commented 1 year ago

@AsakusaRinne Yes, that file exists, maybe another file is missing, but I wouldn't know how to identify which one that could be.

Maybe a zoom meeting could help indeed. How and when would you like to set this up?

AsakusaRinne commented 1 year ago

@AsakusaRinne Yes, that file exists, maybe another file is missing, but I wouldn't know how to identify which one that could be.

Maybe a zoom meeting could help indeed. How and when would you like to set this up?

Sorry for the late reply. I missed the notification. Could you please join our discord channel for better communication? @Oceania2018 will come together since I'm not good at English listening and speaking.

The link is https://discord.com/invite/qRVm82fKTS

darrabam commented 1 year ago

@FaithlessDbo Did you find a solution? I'm having a similar issue, CPU accuracy 70% , GPU < 0.1% !! ~6K classes , +80K images

I tried different configurations: cuda 10.1, cuDNN: 7.6.3 : Best MicroAccuracy: 0.0000 cuda 10.1, cuDNN: 7.6.4 : Best MicroAccuracy: 0.0001 cuda 10.1, cuDNN: 7.6.5 : Best MicroAccuracy: 0.0000

GPU: RTX 4070

FaithlessDbo commented 1 year ago

@darrabam I didn't find any solution to the actual model builder, however with help of @Oceania2018 and @AsakusaRinne I managed to get my situation working with SciSharp-Stack-Examples.

mg-yolo-enterprises commented 1 year ago

@darrabam I also didn't get GPU inference working with ModelBuilder. I ended up using TensorFlow transfer learning to build a classifier that performed almost as well as the one I trained with ModelBuilder. I consume it using ONNX and it's been working well in production over the past several months.

It was a lot more work figuring it all out, but in the end the inferences are faster than on CPU, and having more control over things allows me to push image data directly into a Tensor object from a System.Drawing.Bitmap using unsafe code, saving a significant amount of time compared to ModelBuilder, which requires passing in a byte[] containing the image data. In my application, speed matters and I'm not sure ModelBuilder would've been fast enough even if the GPU inferences worked, because of needing that managed byte array of encoded image data (JPG/PNG/etc).

I would probably try the approach @FaithlessDbo linked to first, but I can post further details if you're interested in how I was able to do it.

darrabam commented 1 year ago

@mg-yolo-enterprises Thanks, yes I'm interested! For now, I just need the setup you used (cuda version, cuDNN, nuget packages ...etc). I went through the tutorial to build the pipeline, and customized it for my own dataset and used the available GPU nuget packages, but the training runs on the CPU. I'm using these packages:

and I resolved all GPU related errors or missing libraries, so I'm not sure what am I missing

mg-yolo-enterprises commented 1 year ago

@darrabam Because it didn't seem likely that modelbuilder support for GPU inference of image classification was going to be fixed anytime soon, I took an alternative approach using the following tools:

Use transfer learning, similar to the modelbuilder approach, to create a model using an existing model. The tool I used was included in tensorflow_hub up to version 0.13 (it's been removed as of 0.14). You can read about it here. It's a simple CLI tool to generate a model from an existing "feature vector" (CNN without the final layer). There are many feature vectors to choose from; your choice will determine the speed and accuracy that your model runs at. In my case the model achieved 100% accuracy, but it's a fairly easy binary classification job.
With your model trained using make_image_classifier, you must then convert the model to ONNX format in order to use it from the Microsoft.ML.OnnxRuntime. I did this using the python tool tf2onnx.
In my C# application, I used the NuGet packages Microsoft.ML.OnnxRuntime.Managed and Microsoft.ML.OnnxRuntime.Gpu.
CUDA is 11.6, cuDNN is 8.5.0
You should be able to use many sources online to learn how to use OnnxRuntime with your trained model for your own purposes. I can send you some sample code if you get stuck.

dotnet / machinelearning-modelbuilder