Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same

fjw1049 commented 5 months ago

RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

Filimoa commented 4 months ago

Could you print your version of openparse + torch + torchvision, your python version, your OS. Also a full stack trace.

jinmang2 commented 4 months ago

import openparse

# in case of `TableFormer`
tableformer_parser = openparse.DocumentParser(
    table_args={"parsing_algorithm": "table-transformers"}
)
tableformer_parsed_doc = tableformer_parser.parse(PDF_FILEPATH)

# in case of `unitable`
unitable_parser = openparse.DocumentParser(
    table_args={
        "parsing_algorithm": "unitable",
        "min_table_confidence": 0.8,
    },
)
unitable_parsed_doc = unitable_parser.parse(PDF_FILEPATH)

In the provided examples using table-former and unitable within the open-parse library, table_args are specified as input arguments, leading to the execution of certain conditional branches in the code. Specifically, this refers to the logic found here: https://github.com/Filimoa/open-parse/blob/e466e8bc950125f5438c72de25c1c202c95087a4/src/openparse/doc_parser.py#L104-L107

Within openparse/tables/, the func::ingest function calls specific _ingest_with_{PARSING_ARGS} functions based on parsing_args.

Errors mentioned occur within table-transformer and unitable specifically in the func::find_table_bboxes and func::get_table_content functions.

Both functions are part of the openparse.tables.table_transformers.ml module, which handles 🤗 transformers models and devices as follows:

device

https://github.com/Filimoa/open-parse/blob/e466e8bc950125f5438c72de25c1c202c95087a4/src/openparse/tables/table_transformers/ml.py#L40-L42

models

https://github.com/Filimoa/open-parse/blob/e466e8bc950125f5438c72de25c1c202c95087a4/src/openparse/tables/table_transformers/ml.py#L60-L68

When torch.cuda.is_available() returns True, the model is loaded onto the GPU. However, the inputs for the following functions are forcibly moved to the CPU:

openparse.tables.table_transformers.ml.find_table_bboxes

https://github.com/Filimoa/open-parse/blob/e466e8bc950125f5438c72de25c1c202c95087a4/src/openparse/tables/table_transformers/ml.py#L186-L191

openparse.tables.table_transformers.ml.get_table_content

https://github.com/Filimoa/open-parse/blob/e466e8bc950125f5438c72de25c1c202c95087a4/src/openparse/tables/table_transformers/ml.py#L323-L339

To address this inconsistency, I have submitted PR #18 for correction. It would be beneficial if this could be merged to resolve the issue.

jinmang2 commented 4 months ago

In the future, it would also be good to modify the system so that the choice of whether to use CUDA can be passed in when constructing the parser

Filimoa commented 4 months ago

@jinmang2 Thanks for taking the time to troubleshoot this!

Great feedback, you can now

openparse.config.set_device("cpu")

to set the device globally

Filimoa commented 4 months ago

Closing, fixed in v0.5.2

Filimoa / open-parse

Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same #17