facebookresearch / detr

End-to-End Object Detection with Transformers
Apache License 2.0
13.66k stars 2.47k forks source link

DETR table detection to ONNX #545

Open emigomez opened 2 years ago

emigomez commented 2 years ago

I want to transform the table detection model from detr to onnx. Some models available in HF are either "nielsr/detr-table-detection" or "microsoft/table-transformer-detection".

I try both and with the first one is with the one I'm more close to obtaining the final result so... First, I was able to obtain an ONXX model doing:

!python -m transformers.onnx --model=nielsr/detr-table-detection onnx/

ramework not requested. Using torch to export to ONNX.
Some weights of the model checkpoint at nielsr/detr-table-detection were not used when initializing DetrModel: ['bbox_predictor.layers.2.bias', 'bbox_predictor.layers.1.weight', 'bbox_predictor.layers.0.weight', 'bbox_predictor.layers.1.bias', 'bbox_predictor.layers.0.bias', 'model.encoder.layernorm.weight', 'bbox_predictor.layers.2.weight', 'model.encoder.layernorm.bias', 'class_labels_classifier.bias', 'class_labels_classifier.weight']
- This IS expected if you are initializing DetrModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DetrModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.
Using framework PyTorch: 1.12.1+cu113
/usr/local/lib/python3.7/dist-packages/transformers/models/detr/modeling_detr.py:560: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (batch_size * self.num_heads, target_len, source_len):
/usr/local/lib/python3.7/dist-packages/transformers/models/detr/modeling_detr.py:567: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attention_mask.size() != (batch_size, 1, target_len, source_len):
/usr/local/lib/python3.7/dist-packages/transformers/models/detr/modeling_detr.py:591: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (batch_size * self.num_heads, target_len, self.head_dim):
Validating ONNX model...
    -[✓] ONNX model output names match reference model ({'last_hidden_state'})
    - Validating ONNX Model output "last_hidden_state":
        -[✓] (3, 15, 256) matches (3, 15, 256)
        -[✓] all values close (atol: 1e-05)
All good, model saved at: onnx/model.onnx

After that, I'm trying to do the inference with that ONNX model, and I manage to do it but i don't know how to understand the result:

IMAGE_PATH = "1.png"
SCALE = (800,800)
image = Image.open(IMAGE_PATH).convert("RGB")
image = image.resize(SCALE)
feature_extractor = AutoFeatureExtractor.from_pretrained("nielsr/detr-table-detection")
session = InferenceSession("onnx/detr_td.onnx")
# ONNX Runtime expects NumPy arrays as input
inputs = feature_extractor(image, return_tensors="np")
onnx_outputs = session.run(output_names=["last_hidden_state"], input_feed=dict(inputs))

These are some logs regarding the result obtained:

print("input name", session.get_inputs()[0].name)
print("input shape", session.get_inputs()[0].shape)
print("input type", session.get_inputs()[0].type)
print("output name", session.get_outputs()[0].name)
print("output shape", session.get_outputs()[0].shape)
print("output type", session.get_outputs()[0].type)
output_names = [_.name for _ in session.get_outputs()]
output_shapes = [_.shape for _ in onnx_outputs]
print("\noutput_names: ", output_names)
print("output_shapes: ", output_shapes)
print("output shape: ", np.array(onnx_outputs).shape)
result = onnx_outputs[0]
print("output shape [0]: ", result.shape)
result = result[0]
print("output shape [0][0]: ", result.shape)
print(result[1])
----------------------------------------------------------------------------
input name pixel_values
input shape ['batch', 'num_channels', 'height', 'width']
input type tensor(float)
output name last_hidden_state
output shape ['batch', 'sequence', 'Addlast_hidden_state_dim_2']
output type tensor(float)
output_names:  ['last_hidden_state', 'key_value_states']
output_shapes:  [(1, 15, 256)]
output shape:  (1, 1, 15, 256)
output shape [0]:  (1, 15, 256)
output shape [0][0]:  (15, 256)

Does any know how to manage this result? it should be a bounding box of the table detected if there are.

Thanks in advance!

NielsRogge commented 1 year ago

Hi,

Please use the official "microsoft/table-transformer-detection" checkpoint instead of the legacy "nielsr/detr-table-detection".

The command to run to export DETR to ONNX is very easy now thanks to 🤗 Optimum, see this guide: https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model.

jprorikon commented 1 year ago

Hi,

Please use the official "microsoft/table-transformer-detection" checkpoint instead of the legacy "nielsr/detr-table-detection".

The command to run to export DETR to ONNX is very easy now thanks to hugs Optimum, see this guide: https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model.

Hi I follow the guide page you mentioned. But optimum returns me an error like KeyError: "table-transformer is not supported yet. Only {'mbart', 'marian', 'groupvit', 'xlm', 'unispeech-sat', 'electra', 't5', 'resnet', 'nystromformer', 'donut-swin', 'segformer', 'gpt-neox', 'mt5', 'deberta-v2', 'hubert', 'splinter', 'layoutlm', 'wav2vec2', 'data2vec-audio', 'bart', 'sew-d', 'mpnet', 'perceiver', 'flaubert', 'bert', 'gptj', 'roberta', 'clip', 'levit', 'sew', 'bloom', 'squeezebert', 'gpt2', 'mobilevit', 'albert', 'mobilebert', 'convnext', 'poolformer', 'detr', 'distilbert', 'camembert', 'codegen', 'deberta', 'convbert', 'mobilenet-v2', 'opt', 'data2vec-text', 'roformer', 'whisper', 'beit', 'xlm-roberta', 'blenderbot-small', 'blenderbot', 'swin', 'regnet', 'yolos', 'llama', 'vit', 'data2vec-vision', 'audio-spectrogram-transformer', 'm2m-100', 'wav2vec2-conformer', 'deit', 'imagegpt', 'speech-to-text', 'longt5', 'mobilenet-v1', 'gpt-neo', 'layoutlmv3', 'unispeech', 'ibert', 'pegasus', 'wavlm', 'vision-encoder-decoder'} are supported. If you want to support table-transformer please propose a PR or open up an issue."

version info

How can I deal with it? Thanks

jprorikon commented 1 year ago

ok I've solved it by editing the config.json file. Setting model_type from "table-transformer" to "detr" makes Optimum works well for me.

nissansz commented 1 year ago

Can you help provide the exported model for table detect and structure recognition? How to use above onnx models?

nissansz commented 1 year ago

nielsr/detr-table-detection

Cannot run optimum in win10 command line? Any converted onnx models for tale detection and recognition for download? How to use these onnx models?

jprorikon commented 1 year ago

Finally, In my application, I use OpenCV to extract table grids. another candidate I considered is Paddleocr. Paddle2ONNX can easily translate the models to onnx models.

I am sorry that I developed my applications on Ubuntu, So I have no idea about models working on win10 now.

nissansz commented 1 year ago

Thank you. I find paddleocr result is hard to train. It even fails for very clear tables.

NielsRogge commented 1 year ago

Hi,

It seems that DETR is already supported by Optimum, but Table Transformer isn't yet. So one would need to open an issue on the Optimum repo for that.

nissansz commented 1 year ago

I don't know how to convert to onnx, cli is invalid after pip install Optimum. Can you help provide an onnx model and a sample code for detection and recogniztion with onnx models?

aiorga-sherpas commented 8 months ago

Using the optimum cli optimum-cli export onnx --model microsoft/table-transformer-detection onnx_model the warnings described by @emigomez are present still //___/.venv/lib/python3.11/site-packages/transformers/models/table_transformer/modeling_table_transformer.py:558: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attn_weights.size() != (batch_size * self.num_heads, target_len, source_len): //___/.venv/lib/python3.11/site-packages/transformers/models/table_transformer/modeling_table_transformer.py:565: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attention_mask.size() != (batch_size, 1, target_len, source_len): /___/lib/python3.11/site-packages/transformers/models/table_transformer/modeling_table_transformer.py:589: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

Is this the expected behaviour? @NielsRogge

aiorga-sherpas commented 8 months ago

Also, when running the generated model the following error is thrown: onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Invalid input name: pixel_mask The pixel_mask is an optional argument for the table-transformer, but It should be accepted by the onnx model.