IndicoDataSolutions / finetune

Scikit-learn style model finetuning for NLP
https://finetune.indico.io
Mozilla Public License 2.0
703 stars 81 forks source link

AssertionError: Bad argument number for Name: 3, expecting 4 #399

Open emtropyml opened 5 years ago

emtropyml commented 5 years ago

--> model = MultiLabelClassifier(base_model=DistilBERT, batch_size=2, multi_label_sequences=True, n_epochs=3) --> model.fit(trainX, trainY)

WARNING: Entity <bound method Dense.call of <tensorflow.python.layers.core.Dense object at 0x7f4ab843b3c8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method Dense.call of <tensorflow.python.layers.core.Dense object at 0x7f4ab843b3c8>>: AssertionError: Bad argument number for Name: 3, expecting 4

benleetownsend commented 5 years ago

Are you trying to run finetune on tensorflow 2.0 by any chance?

emtropyml commented 5 years ago

No, I'm running it on tensorflow 1.14 only.

Also I'm getting the following error at evaluation time:

(Sequences longer than 512 tokens seem to be not taken care of automatically like in previous version -- just guessing)

ValueError Traceback (most recent call last)

in 1 mlb = MultiLabelBinarizer() 2 predictions =model.predict(testX) ----> 3 print(classification_report(mlb.fit_transform(testY),mlb.fit_transform(predictions))) 4 print(roc_auc_score(mlb.fit_transform(testY), pd.DataFrame(model.predict_proba(testX)).values, average=None)) 5 print(roc_auc_score(mlb.fit_transform(testY), pd.DataFrame(model.predict_proba(testX)).values, average='micro')) ~/.local/lib/python3.5/site-packages/sklearn/metrics/classification.py in classification_report(y_true, y_pred, labels, target_names, sample_weight, digits, output_dict) 1850 """ 1851 -> 1852 y_type, y_true, y_pred = _check_targets(y_true, y_pred) 1853 1854 labels_given = True ~/.local/lib/python3.5/site-packages/sklearn/metrics/classification.py in _check_targets(y_true, y_pred) 69 y_pred : array or indicator matrix 70 """ ---> 71 check_consistent_length(y_true, y_pred) 72 type_true = type_of_target(y_true) 73 type_pred = type_of_target(y_pred) ~/.local/lib/python3.5/site-packages/sklearn/utils/validation.py in check_consistent_length(*arrays) 203 if len(uniques) > 1: 204 raise ValueError("Found input variables with inconsistent numbers of" --> 205 " samples: %r" % [int(l) for l in lengths]) 206 207 ValueError: Found input variables with inconsistent numbers of samples: [1148, 1154]
benleetownsend commented 5 years ago

Thanks for the bug report, I can reproduce the second issue. For now you can set the kwarg chunk_long_sequences=False and this should reinstate the previous behaviour.

benleetownsend commented 5 years ago

Can you provide a minimum reproducible example for the first issue?

emtropyml commented 5 years ago

Here it is:

model = MultiLabelClassifier(base_model=DistilBERT, batch_size=2, chunk_long_sequences=False, multi_label_sequences=True, n_epochs=3) model.fit(trainX, trainY)

mlb = MultiLabelBinarizer() predictions =model.predict(testX) print(classification_report(mlb.fit_transform(testY),mlb.fit_transform(predictions)))

I get the same error during training and inference time.

benleetownsend commented 5 years ago

Can you run pip freeze | grep "tensorflow\|finetune" and send me the output

emtropyml commented 5 years ago

finetune==0.8.3 mesh-tensorflow==0.0.5 tensorflow==1.14.0 tensorflow-datasets==1.0.2 tensorflow-estimator==1.14.0 tensorflow-gpu==1.14.0 tensorflow-hub==0.5.0 tensorflow-metadata==0.13.0 tensorflow-probability==0.7.0rc0 tensorflow-serving-api-gpu==1.13.0

madisonmay commented 5 years ago

Hi @emtropyml -- were you able to make any progress on resolving this issue on your end? Were you calling anything like tf.enable_eager_execution() or similar on your end in the script that ran this code? I'm unable to reproduce this particular issue from the code snippet you've pasted.

madisonmay commented 5 years ago

I was actually just able to reproduce this on another machine! Not sure what's causing it yet though -- seems like it may be something deep in tensorflow. Since it's strictly a warning it seems harmless if the code otherwise works, but we'll see if we can track it down!

emtropyml commented 5 years ago

When are you guys planning on releasing Finetune 0.8.4 on PyPI ?

madisonmay commented 5 years ago

Thanks for the reminder -- 0.8.4 is now live.

madisonmay commented 5 years ago

I get this when running in a CPU environment on a laptop.

emtropyml commented 4 years ago

I'm getting this even in a GPU environment on GCP.

madisonmay commented 4 years ago

Hrmmm -- curious. From what I can tell your model should still train fine but the logs are certainly frustrating.

rnyak commented 4 years ago

@madisonmay I am also getting assertion error, when I install finetune directly from source as instructed in the README inside a TF container. This is my finetune syntax:

import time
start = time.time()
model = Classifier(n_epochs=2, base_model=GPT2Model, val_set=(valX, valY), val_size=1500, val_interval=100, tensorboard_folder ='/workspace/tensorboard', max_length=512, chunk_long_sequences=False, keep_best_model= True, eval_acc = True)
model.fit(trainX, trainY)
print("total training time:", time.time() - start)

I get the assertion error below right after I run the command above. It does not start to train.

/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/values.py in regroup(device_map, values, wrap_class)
   1465       assert isinstance(v, list)
   1466       assert len(v) == len(v0), ("len(v) == %d, len(v0) == %d, v: %s, v0: %s" %
-> 1467                                  (len(v), len(v0), v, v0))
   1468     return [regroup(device_map, tuple(v[i] for v in values), wrap_class)
   1469             for i in range(len(v0))]
AssertionError: len(v) == 150, len(v0) == 149, v: [(<tf.Tensor 'replica_1/OptimizeLoss/clip_by_global_norm/replica_1/OptimizeLoss/clip_by_global_norm/_1:0' shape=(50771, 768) dtype=float32>, <tf.Variable 'model/featurizer/we:0' shape=(50771, 768) dtype=float32>),...

I can use finetune docker container without any issue, but I need to install finetune inside TF container, and run the code that way. the tf-gpu version is 1.14.0.

What'd be the reason behind that? How can I fix that?

Thanks.