Psarpei / Multi-Type-TD-TSR

Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:
MIT License
252 stars 51 forks source link

Issue with running table detection and structure recognition #8

Closed Kehindeajayi01 closed 2 years ago

Kehindeajayi01 commented 2 years ago

Hi, Thanks for the great work. I am trying to run your tdtsr.py script on my custom images but I am running into error with the config file:

Screen Shot 1400-10-24 at 11 36 49

' Any idea how to resolve this will be appreciated. Thanks

Kehindeajayi01 commented 2 years ago

I have managed to resolve the issue above. However, after running the tdtsr.py script, there were bunch of warnings and no output was generated according to README file. Do I need to do any preprocessing on my table images before running the tdtsr.py module?

Screen Shot 1400-10-24 at 13 35 34
salman-moh commented 1 year ago

@Kehindeajayi01 how did you solve this error? trying to get a command line for tdtsr on colab. What application are you working on? I'm applying these to medical documents to extract table with contents to obtain a dataframe in pandas at the end.

Kehindeajayi01 commented 1 year ago

@Kehindeajayi01 how did you solve this error? trying to get a command line for tdtsr on colab. What application are you working on? I'm applying these to medical documents to extract table with contents to obtain a dataframe in pandas at the end.

@salman-moh : I ran the code on google colab. This method mainly generates xml files for the table structures and not the text content.

temiwale88 commented 9 months ago

Thanks @Kehindeajayi01 . What did you do with the xml files? Also, how are you resolving the text extraction process? (I'm sure there are better algorithms now). Thanks!

Dipankar1997161 commented 8 months ago

Thanks @Kehindeajayi01 . What did you do with the xml files? Also, how are you resolving the text extraction process? (I'm sure there are better algorithms now). Thanks!

on their collab, there is tesseract ocr which they used for extraction and then popped it to csv. but I can't find the function here.

Did you ?