SciSharp / TensorFlow.NET

.NET Standard bindings for Google's TensorFlow for developing, training and deploying Machine Learning models in C# and F#.
https://scisharp.github.io/tensorflow-net-docs
Apache License 2.0
3.17k stars 507 forks source link

[Question]: UNIMPLEMENTED: Cast int64 to resource is not supported #1153

Closed YJFu-Bifrost closed 12 months ago

YJFu-Bifrost commented 12 months ago

Description

` var layers = keras.layers;
List layerList = new List() { layers.InputLayer(input_shape: new Shape(AITrainDataConfig.WindowSize, AITrainDataConfig.FeatureCount)), layers.LSTM(units: 1024, return_sequences: true), layers.LSTM(units: 1024, return_sequences: true), layers.Dropout(0.2f), layers.LSTM(units: 768,return_sequences:true), layers.LSTM(units: 768,return_sequences:true), layers.Dropout(0.2f), layers.LSTM(units: 512,return_sequences:true), layers.LSTM(units: 512,return_sequences:false), layers.Dropout(0.2f), layers.Dense(256), layers.Dense(1), }; Sequential model = keras.Sequential(layers: layerList); tf.compat.v1.disable_eager_execution();

        model.compile(tf.keras.optimizers.Adam(), tf.keras.losses.MeanSquaredError());
        model.summary();
        List<ICallback> callbackList = new List<ICallback>()
        {
            new EarlyStopping(new() { }, monitor: "val_loss", mode: "min", verbose: 1, patience: 3),
        };

        model.fit(trainInputRawData, trainOutputRawData,
            epochs: 1000,
            batch_size: 512,
            verbose: 1,
            validation_data: new(validateInputRawData, validateOutputRawData),
            callbacks: callbackList);`

model.fit throw exception message: Tensorflow.Exceptions.NotOkStatusException: 'Cast int64 to resource is not supported [[{{node Read/ReadVariableOp/resource}}]]'

but my input data is float[,,] and output data is float[], I never use "long" in this program

Alternatives

No response

Wanglongzhi2001 commented 12 months ago

Hello, it looks like the code you provide has some problem. I use the following code to mimic your code and it ran successfully. You can refer to the following code for further processing, if there is another problem, please let me know.

            var layers = keras.layers;
            Sequential model = keras.Sequential(new List<ILayer>
            {
                layers.InputLayer(input_shape: new Shape(32, 32)),
                layers.LSTM(units: 1024, return_sequences: true),
                layers.LSTM(units: 1024, return_sequences: true),
                layers.Dropout(0.2f),
                layers.LSTM(units: 768,return_sequences:true),
                layers.LSTM(units: 768,return_sequences:true),
                layers.Dropout(0.2f),
                layers.LSTM(units: 512,return_sequences:true),
                layers.LSTM(units: 512,return_sequences:false),
                layers.Dropout(0.2f),
                layers.Dense(256),
                layers.Dense(1),
            });
            var x_train = np.random.random((64, 32, 32));
            var y_train = np.random.random((64, 1));
            List<ICallback> callbackList = new List<ICallback>()
            {
                new EarlyStopping(new() { }, monitor: "val_loss", mode: "min", verbose: 1, patience: 3),
            };

            model.compile(optimizer: keras.optimizers.Adam(),
                            loss: keras.losses.MeanSquaredError(),
                            metrics: new[] { "acc" });
            model.fit(x_train, y_train,
                        batch_size: 64,
                        epochs: 1,
                        validation_data:(x_train,y_train),
                        callbacks:callbackList);
YJFu-Bifrost commented 12 months ago

really thank for your help, It's work! but I have new issues. I have same Model struction, batch_size on same PC using Keras.NET and it's work, but use TensorFlow.NET show memory out, message like:

2023-07-22 20:38:23.196457: I tensorflow/core/common_runtime/bfc_allocator.cc:1101] Sum Total of in-use chunks: 19.27GiB 2023-07-22 20:38:23.196493: I tensorflow/core/common_runtime/bfc_allocator.cc:1103] total_region_allocatedbytes: 22385000448 memorylimit: 22385000448 available bytes: 0 curr_region_allocationbytes: 44770000896 2023-07-22 20:38:23.196528: I tensorflow/core/common_runtime/bfc_allocator.cc:1109] Stats: Limit: 22385000448 InUse: 20690873600 MaxInUse: 22243551744 NumAllocs: 142081 MaxAllocSize: 30953472 Reserved: 0 PeakReserved: 0 LargestFreeBlock: 0

2023-07-22 20:38:23.197224: W tensorflow/core/common_runtime/bfc_allocator.cc:491] **** 2023-07-22 20:38:23.197285: W tensorflow/core/framework/op_kernel.cc:1780] OP_REQUIRES failed at matmul_op_impl.h:728 : RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[1024,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

Wanglongzhi2001 commented 12 months ago

If the model you use uses many lstm layers, the first reason I can think of is the implementation of LSTM layer in TensorFlow.NET is not perfect. The front end of Keras.NET is python, but tensorflow.net calls c_api, which means TensorFlow.NET will be faster usually. And because the LSTM layer in python is optimized when using gpu, but TensorFlow.NET did not yet.