Open avipreshel opened 8 months ago
Same here
I'm having the same problem.
I have the same problem but in a slightly different situation where I get it for batch sizes greater than 1, but epoch counts greater than 1 do not trigger it.
Hi all, I faced the same problem when implementing the custom loss function. It's very confusing when the two loss functions below, one works, and the other one threw the null exception at the GetDataType().
Custom loss based on MSE (this works):
public class CustomLoss : ILossFunc
{
public string Reduction => "auto";
public string Name => "custom_loss";
public Tensor Call(Tensor y_true, Tensor y_pred, Tensor sample_weight = null)
{
var mse_loss = tf.reduce_mean(tf.square(y_pred - y_true), axis: -1);
return mse_loss;
}
}
My custom loss function, where I convert the y_true and y_pred to float array, do some calculations for the loss function, convert the loss back to Tensor. This is when the error arises.
public class CustomLoss : ILossFunc
{
public string Reduction => "auto";
public string Name => "custom_loss";
public Tensor Call(Tensor y_true, Tensor y_pred, Tensor sample_weight = null)
{
int batch_size = y_true.shape.as_int_list()[0]; //extract the first element of the shape of the tensor
//convert Tensor to 1D array
var array_true = y_true.ToArray<float>();
var array_pred = y_pred.ToArray<float>();
float[] loss = new float[batch_size];
//perform some calculations here to compute the loss based on array_true and array_pred
//.........
var loss_tf = tf.convert_to_tensor(loss, dtype: TF_DataType.TF_FLOAT, shape: new Shape(batch_size));
return loss_tf;
}
}
The returned Tensor mse_loss
and loss_tf
seem to have everything similar to each other, including the type, dimension, etc. Yet, the later threw a null at the GetDataType().
I've spent hours, but no luck figuring out the solutions. Any help would be appreciated. Thank you.
It seems to be a problem introduced in the latest version. But, I'm sorry, I don't have enough time to dig deeply into it now. GetDataType
is something related with the native APIs. If you want to debug it, please clone the repo and run in debug mode with the repo as dependency, instead of the nuget package.
Hi @AsakusaRinne, I followed your instructions and got the Call Stack below.
Tensorflow.Binding.dll!Tensorflow.Binding.GetDataType(object data) Line 513 C#
Tensorflow.Binding.dll!Tensorflow.ops.convert_to_tensor(object value, Tensorflow.TF_DataType dtype, string name, bool as_ref, Tensorflow.TF_DataType preferred_dtype, Tensorflow.Contexts.Context ctx) Line 128 C#
Tensorflow.Binding.dll!Tensorflow.tensorflow.convert_to_tensor(object value, Tensorflow.TF_DataType dtype, string name, Tensorflow.TF_DataType preferred_dtype) Line 24 C#
Tensorflow.Binding.dll!Tensorflow.Eager.EagerRunner.AddInputToOp(object inputs, bool add_type_attr, Tensorflow.OpDef.Types.ArgDef input_arg, System.Collections.Generic.List<object> flattened_attrs, System.Collections.Generic.List<Tensorflow.Tensor> flattened_inputs, Tensorflow.Eager.SafeEagerOpHandle op, Tensorflow.Status status) Line 211 C#
Tensorflow.Binding.dll!Tensorflow.Eager.EagerRunner.TFE_FastPathExecute(Tensorflow.FastPathOpExecInfo op_exec_info) Line 126 C#
Tensorflow.Binding.dll!Tensorflow.Contexts.Context.ExecEagerAction(string OpType, string Name, Tensorflow.ExecuteOpArgs args) Line 56 C#
Tensorflow.Binding.dll!Tensorflow.Contexts.Context.ExecuteOp(string opType, string name, Tensorflow.ExecuteOpArgs args) Line 102 C#
Tensorflow.Binding.dll!Tensorflow.gen_training_ops.resource_apply_adam(Tensorflow.Tensor var, Tensorflow.Tensor m, Tensorflow.Tensor v, Tensorflow.Tensor beta1_power, Tensorflow.Tensor beta2_power, Tensorflow.Tensor lr, Tensorflow.Tensor beta1, Tensorflow.Tensor beta2, Tensorflow.Tensor epsilon, Tensorflow.Tensor grad, bool use_locking, bool use_nesterov, string name) Line 27 C#
Tensorflow.Keras.dll!Tensorflow.Keras.Optimizers.Adam._resource_apply_dense(Tensorflow.IVariableV1 var, Tensorflow.Tensor grad, System.Collections.Generic.Dictionary<Tensorflow.Keras.Optimizers.DeviceDType, System.Collections.Generic.Dictionary<string, Tensorflow.Tensor>> apply_state) Line 75 C#
Tensorflow.Keras.dll!Tensorflow.Keras.Optimizers.OptimizerV2.apply_grad_to_update_var(Tensorflow.IVariableV1 var, Tensorflow.Tensor grad, System.Collections.Generic.Dictionary<Tensorflow.Keras.Optimizers.DeviceDType, System.Collections.Generic.Dictionary<string, Tensorflow.Tensor>> apply_state) Line 119 C#
Tensorflow.Keras.dll!Tensorflow.Keras.Optimizers.OptimizerV2._distributed_apply.AnonymousMethod__1(Tensorflow.ops.NameScope <p0>) Line 142 C#
Tensorflow.Binding.dll!Tensorflow.Binding.tf_with<Tensorflow.ops.NameScope>(Tensorflow.ops.NameScope py, System.Action<Tensorflow.ops.NameScope> action) Line 199 C#
Tensorflow.Keras.dll!Tensorflow.Keras.Optimizers.OptimizerV2._distributed_apply.AnonymousMethod__0(Tensorflow.ops.NameScope <p0>) Line 140 C#
Tensorflow.Binding.dll!Tensorflow.Binding.tf_with<Tensorflow.ops.NameScope>(Tensorflow.ops.NameScope py, System.Action<Tensorflow.ops.NameScope> action) Line 199 C#
Tensorflow.Keras.dll!Tensorflow.Keras.Optimizers.OptimizerV2._distributed_apply(System.Collections.Generic.IEnumerable<(Tensorflow.Tensor, Tensorflow.IVariableV1)> grads_and_vars, string name, System.Collections.Generic.Dictionary<Tensorflow.Keras.Optimizers.DeviceDType, System.Collections.Generic.Dictionary<string, Tensorflow.Tensor>> _apply_state) Line 136 C#
Tensorflow.Keras.dll!Tensorflow.Keras.Optimizers.OptimizerV2.apply_gradients.AnonymousMethod__1(Tensorflow.ops.NameScope <p0>) Line 74 C#
Tensorflow.Binding.dll!Tensorflow.Binding.tf_with<Tensorflow.ops.NameScope, Tensorflow.Operation>(Tensorflow.ops.NameScope py, System.Func<Tensorflow.ops.NameScope, Tensorflow.Operation> action) Line 207 C#
Tensorflow.Keras.dll!Tensorflow.Keras.Optimizers.OptimizerV2.apply_gradients(System.Collections.Generic.IEnumerable<(Tensorflow.Tensor, Tensorflow.IVariableV1)> grads_and_vars, string name, bool experimental_aggregate_gradients) Line 63 C#
Tensorflow.Keras.dll!Tensorflow.Keras.Engine.Model._minimize(Tensorflow.Gradients.GradientTape tape, Tensorflow.Keras.Engine.IOptimizer optimizer, Tensorflow.Tensor loss, System.Collections.Generic.List<Tensorflow.IVariableV1> trainable_variables) Line 107 C#
Tensorflow.Keras.dll!Tensorflow.Keras.Engine.Model.train_step(Tensorflow.Keras.Engine.DataAdapters.DataHandler data_handler, Tensorflow.Tensors x, Tensorflow.Tensors y) Line 57 C#
Tensorflow.Keras.dll!Tensorflow.Keras.Engine.Model.train_step_function(Tensorflow.Keras.Engine.DataAdapters.DataHandler data_handler, Tensorflow.OwnedIterator iterator) Line 16 C#
Tensorflow.Keras.dll!Tensorflow.Keras.Engine.Model.FitInternal(Tensorflow.Keras.Engine.DataAdapters.DataHandler data_handler, int epochs, int verbose, System.Collections.Generic.List<Tensorflow.Keras.Engine.ICallback> callbackList, Tensorflow.Util.ValidationDataPack validation_data, System.Func<Tensorflow.Keras.Engine.DataAdapters.DataHandler, Tensorflow.OwnedIterator, System.Collections.Generic.Dictionary<string, float>> train_step_func) Line 282 C#
Tensorflow.Keras.dll!Tensorflow.Keras.Engine.Model.fit(Tensorflow.NumPy.NDArray x, Tensorflow.NumPy.NDArray y, int batch_size, int epochs, int verbose, System.Collections.Generic.List<Tensorflow.Keras.Engine.ICallback> callbacks, float validation_split, Tensorflow.Util.ValidationDataPack validation_data, int validation_step, bool shuffle, System.Collections.Generic.Dictionary<int, float> class_weight, Tensorflow.NumPy.NDArray sample_weight, int initial_epoch, int max_queue_size, int workers, bool use_multiprocessing) Line 85 C#
The Null data type happens within the EagerRunner.TFE_FastPathExecute.cs
at this function:
public Tensor[] TFE_FastPathExecute(FastPathOpExecInfo op_exec_info)
From debugging, the op_exec_info.arg[i]
happened to be Null at i = 9
, during the op_name = "ResourceApplyAdam"
For reference, this is how it looks like when it runs without error:
As I'm quite new to Tensorflow, I can only trace back the problem to this far. Hopefully it gives you some idea where the issue is.
Description
I have a pretty plain code with a custom loss function. The code throws a null exception right at the beginning
`using Tensorflow; using Tensorflow.Keras.Losses; using Tensorflow.Keras.Metrics; using Tensorflow.Keras.Optimizers; using Tensorflow.NumPy; using Tensorflow.Operations.Initializers; using static Tensorflow.Binding; using static Tensorflow.KerasApi;
namespace KerasDotNet { internal class WeightedF1Loss : ILossFunc { public string Reduction => throw new NotImplementedException();
}`
The exception is thrown from GetDataType() since "data" is null.
Stack trace dump:
Reproduction Steps
Run the code in the snipped as it's A self contained code (does not ready any files or configuration).
System specs: Windows 10 x64 RTX 2080 Super
Known Workarounds
None
Configuration and Other Information
No response