Open NME-rahul opened 7 months ago
ISSUE 1 is for everyone
FIND necessary information about the project
1. List the website where the dataset is available.
2. Find the best possible architecture for OCR.
- Find best OCR model on hugging face or any website
- Alternatively, create and train CRNN architecture if not found.
3. Find he best possible architecture for knowledge representation.
Logic-based Representation:
- First-order logic (FOL)
- Description Logics (DL)
- Rule-based systems
Semantic Networks:
- Represent knowledge as a network of interconnected nodes, where nodes represent concepts or entities, and edges represent relationships between them.
- Tools: GraphDB
Ontologies:
- Formal specifications of a shared conceptualization of a domain. Ontologies define classes, properties, and relationships between entities using a standardized language.
- Tools: Protégé
Frames and Scripts:
- Frames represent knowledge as structured records or templates consisting of slots (attributes) and fillers (values).
- Scripts represent stereotypical sequences of events or actions in a particular domain.
Knowledge Graphs:
- Graph-based representation of knowledge, where entities are represented as nodes, and relationships between entities are represented as edges.
- Tools: Neo4j, Amazon Neptune, and Apache TinkerPop, RDFLib Python library
Probabilistic Models:
- Represent uncertainty and probabilistic relationships between variables in a domain.
- Tools: Bayesian networks, Markov random fields, and probabilistic graphical models.
Frame-based Systems:
- Organize knowledge into frames, which are structured representations containing slots for properties and values.
- Used in areas like natural language understanding, expert systems, and robotics.
- Temporal Representation:
- Represent knowledge that evolves over time, such as events, processes, or changes in states.
- Temporal logics and temporal databases are used to represent and reason about temporal knowledge.
NOTE Comment in issue whatever research paper, tech and approach you found in structured way!
ELSE steps will be discuss in meet or next issue 🤩
Semantic Text Similarity: ( Word2Vec, GloVe, FastText) and Transformer-based models (e.g., BERT, RoBERTa, GPT) encode words or sentences into high-dimensional vectors, capturing their semantic relationships.
datasets for ocr and transformer https://www.kaggle.com/datasets/nibinv23/iam-handwriting-word-database/ (https://github.com/sushant097/Handwritten-Line-Text-Recognition-using-Deep-Learning-with-Tensorflow/tree/master) https://www.kaggle.com/datasets/naderabdalghani/iam-handwritten-forms-dataset/ this the BERT transformer model (https://www.kaggle.com/code/salehbinsuwaylih/bert-text-classification)
Transformers, That can be used:
BERT (Bidirectional Encoder Representations from Transformers): BERT is a widely-used transformer model that has shown strong performance in various NLP tasks. While it's primarily designed for contextualized word embeddings, it can also be fine-tuned for text recognition tasks.
Transformer based OCR: This model is specifically designed for Optical Character Recognition (OCR) tasks. It uses a transformer architecture to process images and extract text information. It's efficient and effective for recognizing text in images.
Tesseract with LSTM: Tesseract is an open-source OCR engine that has been enhanced with LSTM (Long Short-Term Memory) networks. It's widely used for OCR tasks and can recognize text from images with reasonable accuracy.
LayoutLM: LayoutLM is a transformer-based model designed for document image understanding tasks, including OCR. It considers the layout and spatial information of text in addition to textual content, making it suitable for recognizing text in documents and images.
ViT (Vision Transformer): While initially designed for computer vision tasks, ViT can also be adapted for text recognition tasks. It processes images in a patch-wise manner using transformer layers, making it a potential candidate for handwritten text recognition.
Orca 2: Orca 2 is built for research purposes only and provides a single turn response in tasks such as reasoning over user given data, reading comprehension, math problem solving and text summarization. (for maths)
Standard OCR with high accuracy: https://www.kaggle.com/code/gpiosenka/test-set-f1-score-99-efficientnetb3
COLAB link: https://colab.research.google.com/drive/1at6nyjlByJjf-7naHAKY9DyOko32IQX4#scrollTo=Sbqq3d2-PVZ9
https://huggingface.co/microsoft/trocr-base-handwritten
https://paperswithcode.com/task/handwritten-text-recognition
https://paperswithcode.com/paper/trocr-transformer-based-optical-character
https://www.kaggle.com/datasets/preatcher/standard-ocr-dataset/code
Run TrOCR ipynb @harshitkumardaga
ISSUE 1.2
- Ask mentor for data for OCR: @harshitkumardaga @Bhushit-S
- Research for Grading system: @NME-rahul @Neelamsethia
- labeled IAMDataset: @Neelamsethia
- Data(OOPS) for Transformer: @NME-rahul
@Neelamsethia @harshitkumardaga @Bhushit-S
@harshitkumardaga @Bhushit-S , as we know tesseract model is not working on images with handwritten symbols and characters, to make this possible try image pre-processing and perform operations given below,
Due Date: 15/03/2024
Approaches to find text similarities
- Word2Vec(skip-grams)
- Word Embeddings
- Cosine similarities
- Hamming distance
- Pretrained language model(Bert(Encoder part of transformer), DistilBERT)
@Neelamsethia @harshitkumardaga @Bhushit-S
https://colab.research.google.com/drive/12hL3Fi533bvRNBnZIFhDIcfYpnEcObW8?usp=sharing
$$Score = \frac{SA}{OA}$$
$$ScaledScore = Scale * Score$$
Eg.
Short answer: Total vectors in original answers are 9 and Total vectors matched in student's answer with original answers are 5 and that is written for 8 marks question, then student will get similarity $\frac{5}{9} = 0.55$ and scaled score $8 * 0.55 = 4.95$
Long answer: Total vectors in original answers are 56 and Total vectors matched in student's answer with original answers are 43 and that is written for 15 marks question, then student will get similarity $\frac{23}{56} = 0.77$ and scaled score $15 * 0.77 = 11.58$
https://colab.research.google.com/drive/12hL3Fi533bvRNBnZIFhDIcfYpnEcObW8?usp=sharing
@Neelamsethia
Please provide us with the /content/stop_words.txt used in the colaboratory https://colab.research.google.com/drive/12hL3Fi533bvRNBnZIFhDIcfYpnEcObW8?usp=sharing @NME-rahul
Please provide us with the /content/stop_words.txt used in the colaboratory https://colab.research.google.com/drive/12hL3Fi533bvRNBnZIFhDIcfYpnEcObW8?usp=sharing @NME-rahul
1. Clone and push in github repository
Fork the Repository: Go to the repository https://github.com/NME-rahul/AI-AGS on GitHub and click on the "Fork" button in the upper right corner. This creates a copy of the repository in your GitHub account.
Set your github credentials
initialize .git directory
Clone the repository in your local machine
Navigate to repository
cd AI-AGS
Create a new branch(use any branch name)
To add in repo . represents all files i your current working directory and to commit specific file add file name instead of dot.
To commit(it will commit in your local machine not repo)
Push to origin(your) github repository
NOTE: Here you will get error to resolve it install git-credential manger
Add upstream url to sync with master branch
Push into master branch
Create pull request in your pushed changes
NOTE: If you don't want to know above process simply send the code file in whatsapp group, the above process is just for only seamless collaboration 😜😜