AlibabaResearch / AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Apache License 2.0
1.15k stars 143 forks source link
artificial-intelligence computer-vision document document-analysis document-intelligence document-recognition document-understanding documentai end-to-end-ocr multimodal multimodal-deep-learning ocr scene-text-detection scene-text-detection-recognition scene-text-recognition text-detection text-recognition vision-language vision-language-model vision-language-transformer

Advanced Literate Machinery

Introduction

The ultimate goal of our research is to build a system that has high-level intelligence, i.e., possessing the abilities to read, think and create, so advanced that it could even surpass human intelligence one day in the future. We name this kind of systems Advanced Literate Machinery (ALM).

To start with, we currently focus on teaching machines to read from images and documents. In years to come, we will explore the possibilities of endowing machines with the intellectual capabilities of thinking and creating, catching up with and surpassing GPT-4 and GPT-4V.

This project is maintained by the 读光 OCR Team (读光-Du Guang means “Reading The Light”) in the Tongyi Lab, Alibaba Group.

Logo

Visit our 读光-Du Guang Portal and DocMaster to experience online demos for OCR and Document Understanding.

Recent Updates

2024.4 Release

2024.3 Release

2023.9 Release

2023.6 Release

2023.4 Release

2023.2 Release

2022.9 Release