Tensorflow / Traceback error. #193

jesspeers commented 5 months ago


I'm hoping to use Basenji on a HPC using slurm so have been attempting to work through the tutorials to ensure my install works correctly and to learn about how to run the scripts. (The tutorials are very well explained - thank you for making it so accessible!)

I have successfully run the first tutorial (preprocess) but am having issues with the train_test tutorial.

I submitted the following to a GPU node on our cluster:

python bin/ -o tutorials/models/heart tutorials/models/params_small.json data/heart_l131k_redownload

I tried running it on the data generated by the preprocessing data tutorial and I also tried downloading the data from the start of the train_test tutorial and had the same issue both times.

I got the following error:

2024-04-22 11:57:47.666785: I tensorflow/core/platform/] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-crit
ical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING:tensorflow:AutoGraph could not transform <function SeqDataset.generate_parser.<locals>.parse_proto at 0x7f497a35c700> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: invalid syntax (, line 39)
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <function SeqDataset.generate_parser.<locals>.parse_proto at 0x7f49785401f0> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: invalid syntax (, line 39)
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
2024-04-22 11:57:48.912123: I tensorflow/compiler/mlir/] None of the MLIR optimization passes are enabled (registered 2)
2024-04-22 11:57:48.924943: I tensorflow/core/platform/profile_utils/] CPU Frequency: 2000000000 Hz
WARNING:tensorflow:AutoGraph could not transform <function shift_sequence at 0x7f4979e6e9d0> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: invalid syntax (, line 25)
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
Traceback (most recent call last):
  File "/ei/.project-scratch/0/0c7dc7bf-67f2-4ebb-ae07-2d34c4b403df/basenji/bin/", line 182, in <module>
  File "/ei/.project-scratch/0/0c7dc7bf-67f2-4ebb-ae07-2d34c4b403df/basenji/bin/", line 174, in main
  File "/ei/.project-scratch/0/0c7dc7bf-67f2-4ebb-ae07-2d34c4b403df/basenji/basenji/", line 543, in fit_tape
  File "/opt/software/mamba_basenji/lib/python3.9/site-packages/tensorflow/python/keras/", line 253, in reset_states
    K.batch_set_value([(v, 0) for v in self.variables])
  File "/opt/software/mamba_basenji/lib/python3.9/site-packages/tensorflow/python/util/", line 201, in wrapper
    return target(*args, **kwargs)
  File "/opt/software/mamba_basenji/lib/python3.9/site-packages/tensorflow/python/keras/", line 3706, in batch_set_value
    x.assign(np.asarray(value, dtype=dtype(x)))
  File "/opt/software/mamba_basenji/lib/python3.9/site-packages/tensorflow/python/ops/", line 888, in assign
    raise ValueError(
ValueError: Cannot assign to variable count:0 due to variable shape (3,) and value shape () are incompatible

This is the output of the job before it failed:

Model: "model_1"
Layer (type)                    Output Shape         Param #     Connected to                     
sequence (InputLayer)           [(None, 131072, 4)]  0                                            
stochastic_reverse_complement ( ((None, 131072, 4),  0           sequence[0][0]                   
stochastic_shift (StochasticShi (None, 131072, 4)    0           stochastic_reverse_complement[0][
tf.nn.gelu (TFOpLambda)         (None, 131072, 4)    0           stochastic_shift[0][0]           
conv1d (Conv1D)                 (None, 131072, 64)   3840        tf.nn.gelu[0][0]                 
batch_normalization (BatchNorma (None, 131072, 64)   256         conv1d[0][0]                     
max_pooling1d (MaxPooling1D)    (None, 16384, 64)    0           batch_normalization[0][0]        
tf.nn.gelu_1 (TFOpLambda)       (None, 16384, 64)    0           max_pooling1d[0][0]              
conv1d_1 (Conv1D)               (None, 16384, 64)    20480       tf.nn.gelu_1[0][0]               
batch_normalization_1 (BatchNor (None, 16384, 64)    256         conv1d_1[0][0]                   
max_pooling1d_1 (MaxPooling1D)  (None, 4096, 64)     0           batch_normalization_1[0][0]      
tf.nn.gelu_2 (TFOpLambda)       (None, 4096, 64)     0           max_pooling1d_1[0][0]            
conv1d_2 (Conv1D)               (None, 4096, 72)     23040       tf.nn.gelu_2[0][0]               
batch_normalization_2 (BatchNor (None, 4096, 72)     288         conv1d_2[0][0]                   
max_pooling1d_2 (MaxPooling1D)  (None, 1024, 72)     0           batch_normalization_2[0][0]      
tf.nn.gelu_3 (TFOpLambda)       (None, 1024, 72)     0           max_pooling1d_2[0][0]            
conv1d_3 (Conv1D)               (None, 1024, 32)     6912        tf.nn.gelu_3[0][0]               
batch_normalization_3 (BatchNor (None, 1024, 32)     128         conv1d_3[0][0]                   
tf.nn.gelu_4 (TFOpLambda)       (None, 1024, 32)     0           batch_normalization_3[0][0]      
conv1d_4 (Conv1D)               (None, 1024, 72)     2304        tf.nn.gelu_4[0][0]               
batch_normalization_4 (BatchNor (None, 1024, 72)     288         conv1d_4[0][0]                   
dropout (Dropout)               (None, 1024, 72)     0           batch_normalization_4[0][0]      
add (Add)                       (None, 1024, 72)     0           max_pooling1d_2[0][0]            
tf.nn.gelu_5 (TFOpLambda)       (None, 1024, 72)     0           add[0][0]                        
conv1d_5 (Conv1D)               (None, 1024, 32)     6912        tf.nn.gelu_5[0][0]               
batch_normalization_5 (BatchNor (None, 1024, 32)     128         conv1d_5[0][0]                   
tf.nn.gelu_6 (TFOpLambda)       (None, 1024, 32)     0           batch_normalization_5[0][0]      
conv1d_6 (Conv1D)               (None, 1024, 72)     2304        tf.nn.gelu_6[0][0]               
batch_normalization_6 (BatchNor (None, 1024, 72)     288         conv1d_6[0][0]                   
dropout_1 (Dropout)             (None, 1024, 72)     0           batch_normalization_6[0][0]      
add_1 (Add)                     (None, 1024, 72)     0           add[0][0]                        
tf.nn.gelu_7 (TFOpLambda)       (None, 1024, 72)     0           add_1[0][0]                      
conv1d_7 (Conv1D)               (None, 1024, 32)     6912        tf.nn.gelu_7[0][0]               
batch_normalization_7 (BatchNor (None, 1024, 32)     128         conv1d_7[0][0]                   
tf.nn.gelu_8 (TFOpLambda)       (None, 1024, 32)     0           batch_normalization_7[0][0]      
conv1d_8 (Conv1D)               (None, 1024, 72)     2304        tf.nn.gelu_8[0][0]               
batch_normalization_8 (BatchNor (None, 1024, 72)     288         conv1d_8[0][0]                   
dropout_2 (Dropout)             (None, 1024, 72)     0           batch_normalization_8[0][0]      
add_2 (Add)                     (None, 1024, 72)     0           add_1[0][0]                      
tf.nn.gelu_9 (TFOpLambda)       (None, 1024, 72)     0           add_2[0][0]                      
conv1d_9 (Conv1D)               (None, 1024, 32)     6912        tf.nn.gelu_9[0][0]               
batch_normalization_9 (BatchNor (None, 1024, 32)     128         conv1d_9[0][0]                   
tf.nn.gelu_10 (TFOpLambda)      (None, 1024, 32)     0           batch_normalization_9[0][0]      
conv1d_10 (Conv1D)              (None, 1024, 72)     2304        tf.nn.gelu_10[0][0]              
batch_normalization_10 (BatchNo (None, 1024, 72)     288         conv1d_10[0][0]                  
dropout_3 (Dropout)             (None, 1024, 72)     0           batch_normalization_10[0][0]     
add_3 (Add)                     (None, 1024, 72)     0           add_2[0][0]                      
tf.nn.gelu_11 (TFOpLambda)      (None, 1024, 72)     0           add_3[0][0]                      
conv1d_11 (Conv1D)              (None, 1024, 32)     6912        tf.nn.gelu_11[0][0]              
batch_normalization_11 (BatchNo (None, 1024, 32)     128         conv1d_11[0][0]                  
tf.nn.gelu_12 (TFOpLambda)      (None, 1024, 32)     0           batch_normalization_11[0][0]     
conv1d_12 (Conv1D)              (None, 1024, 72)     2304        tf.nn.gelu_12[0][0]              
batch_normalization_12 (BatchNo (None, 1024, 72)     288         conv1d_12[0][0]                  
dropout_4 (Dropout)             (None, 1024, 72)     0           batch_normalization_12[0][0]     
add_4 (Add)                     (None, 1024, 72)     0           add_3[0][0]                      
tf.nn.gelu_13 (TFOpLambda)      (None, 1024, 72)     0           add_4[0][0]                      
conv1d_13 (Conv1D)              (None, 1024, 32)     6912        tf.nn.gelu_13[0][0]              
batch_normalization_13 (BatchNo (None, 1024, 32)     128         conv1d_13[0][0]                  
tf.nn.gelu_14 (TFOpLambda)      (None, 1024, 32)     0           batch_normalization_13[0][0]     
conv1d_14 (Conv1D)              (None, 1024, 72)     2304        tf.nn.gelu_14[0][0]              
batch_normalization_14 (BatchNo (None, 1024, 72)     288         conv1d_14[0][0]                  
dropout_5 (Dropout)             (None, 1024, 72)     0           batch_normalization_14[0][0]     
add_5 (Add)                     (None, 1024, 72)     0           add_4[0][0]                      
tf.nn.gelu_15 (TFOpLambda)      (None, 1024, 72)     0           add_5[0][0]                      
conv1d_15 (Conv1D)              (None, 1024, 64)     4608        tf.nn.gelu_15[0][0]              
batch_normalization_15 (BatchNo (None, 1024, 64)     256         conv1d_15[0][0]                  
dropout_6 (Dropout)             (None, 1024, 64)     0           batch_normalization_15[0][0]     
tf.nn.gelu_16 (TFOpLambda)      (None, 1024, 64)     0           dropout_6[0][0]                  
dense (Dense)                   (None, 1024, 3)      195         tf.nn.gelu_16[0][0]              
switch_reverse (SwitchReverse)  (None, 1024, 3)      0           dense[0][0]                      
Total params: 111,011
Trainable params: 109,235
Non-trainable params: 1,776
model_strides [128]
target_lengths [1024]
target_crops [0]
Checkpoint restored at epoch 4, optimizer iteration 1812.
Successful first step!
Epoch 4 - 570s - train_loss: 0.3677 - train_r: 0.2641 - train_r2: 0.0688 - valid_loss: 0.3551 - valid_r: 0.3145 - valid_r2: 0.0918 - best!

I've spoken to our computing team and they don't think it's an issue with the install. I was just wondering if you had any insight into what might be causing this error? I am not familiar with Tensorflow so I'm not sure if this is an issue with the way I'm trying to run Basenji.

I'd really appreciate any help or guidance! Happy to provide any further info if required.

Many thanks, Jess

davek44 commented 5 months ago

Hi Jess, I'm not exactly sure what's going on there. We've moved on to a new codebase here:, where we're continuing to actively develop and follow better software engineering practices. I'd recommend jumping over and trying your application there. Reach out if you get stuck, and we'll try to help.

jesspeers commented 5 months ago

Thank you! I'll give that a go