JaidedAI / EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
https://www.jaided.ai
Apache License 2.0
23.93k stars 3.13k forks source link

OCR table #200

Closed kollols closed 4 years ago

kollols commented 4 years ago

I was trying to OCR an image with table.But i didn't get the output in the form of a table.can we ocr a table like document?

The output i got:

['EPRTR', 'The European Pollutant Release and Transfer Register', 'E-PRTR pollutants and their thresholds', 'facility has to report data under E-PRTR if it fulfils the following criteria:', 'the facility falls under at least one of the 65EPRTR economic activities.', 'The', 'activities are also reported using a statistical classification of economic activities', '(NACE rev2)', 'the facility has a capacity exceeding at least one of the E-PRTR capacity', 'thresholds', 'the facility releases pollutants or transfers waste off-site which exceed specific', 'thresholds set out in Article5 of the E-PRTR Regulation', 'These thresholds for', 'releases of pollutants are specified for each media', 'air, water and land', 'in Annex', 'I[ of the E-PRTR Regulation.', 'In the following tables you will find the 91 E-PRTR pollutants and their thresholds broken', 'down by the 7 groups used in all the searches of the E-PRTR website.', 'Greenhouse gases', 'THRESHOLD FOR RELEASES', 'to air', 'to water', 'to land', 'kg/year', 'kg/year', 'kg/year', 'Carbon dioxide (CO2)', '1o0 million', 'Hydro-fluorocarbons (HFCs', '1o0', 'Methane', 'CH4)', '00 000', 'Nitrous oxide (N2O)', '10 000', 'Perfluorocarbons', 'PFCs', '1o0', 'Sulphur hexafluoride (SF6', '50', 'Other gases', 'THRESHOLD FOR RELEASES', 'to land', 'to air', 'to water', 'kg/year', 'kg/year', 'kg/year', 'Ammonia (NH3', '0 0o0', 'Carbon monoxide (CO', '500 000', 'Chlorine and inorganic compounds', '10 000', '(as HCI)', 'Chlorofluorocarbons', 'CFCs', 'Flourine and inorganic compounds', '5 000', '(as HF)', 'Halons', 'Hydrochlorofluorocarbons (HCFCs)', '1', 'Hydrogen Cyanide (HCN)', '200', 'Nitrogen oxides (NOx/NO2)', 'o0 000', 'Non-methane volatile organic', 'o0 000', 'compounds (NMVOC)', 'Sulphur oxides', 'SOx', '$02)', '150 000', 'Heavy metals', 'THRESHOLD FOR RELEASES', 'to land', 'to air', 'to water', 'kg/year', 'kg/year', 'kg/year', 'As)', 'Arsenic and compounds (as', '20', '5', '5', 'Cadmium and compounds', 'as Cd)', '0', '5', '5', 'Chromium and compounds (as Cr)', 'o0', '50', '50', 'Copper and compounds (as Cu)', '1o0', '50', '50', 'Lead and compounds (as Pb', '200', '20', '20', 'Mercury and compounds', 'as Hg)', '10', '1', '1', 'Nickel and compounds', 'as Ni)', '20', '20', '50', 'Zinc and compounds (as Zn)', '200', '1o0', '1o0']

eu-0011

akshowhini commented 4 years ago

@kollols The primary objective is to detect the characters but not the layout. You may use the bounding boxes that the library gives in the output and identify the table based on it.

kollols commented 4 years ago

@akshowhini : Thanks for clarifying.