NME-rahul commented 7 months ago

1. Clone and push in github repository

Fork the Repository: Go to the repository https://github.com/NME-rahul/AI-AGS on GitHub and click on the "Fork" button in the upper right corner. This creates a copy of the repository in your GitHub account.

Set your github credentials

git config user.email <your-github-mail-id>

git config user.name <your-github-user-name>

initialize .git directory
```
git init
```

Clone the repository in your local machine

git clone https://github.com/your-username/AI-AGS.git

Navigate to repository

cd AI-AGS
Create a new branch(use any branch name)
```
git checkout -b <branch_name>
```
To add in repo . represents all files i your current working directory and to commit specific file add file name instead of dot.
```
git add .
```
To commit(it will commit in your local machine not repo)
```
git commit -m "Description of the changes made"
```
Push to origin(your) github repository
```
git push origin <branch_name>
```

NOTE: Here you will get error to resolve it install git-credential manger

Add upstream url to sync with master branch

git remote add upstream https://github.com/NME-rahul/AI-AGS.git

Push into master branch
```
git push upstream <branch_name>
```
Create pull request in your pushed changes

TO GET DETAILED PROCESS FOLLOW: https://colab.research.google.com/drive/1ueRzYNn9r_eCivR-w4Lcj2v3pFFM_29U?usp=sharing

NOTE: If you don't want to know above process simply send the code file in whatsapp group, the above process is just for only seamless collaboration 😜😜

NME-rahul commented 7 months ago

ISSUE 1 is for everyone

FIND necessary information about the project

1. List the website where the dataset is available.

2. Find the best possible architecture for OCR.

Find best OCR model on hugging face or any website

Alternatively, create and train CRNN architecture if not found.

3. Find he best possible architecture for knowledge representation.

Logic-based Representation:

First-order logic (FOL)

Description Logics (DL)

Rule-based systems

Semantic Networks:

Represent knowledge as a network of interconnected nodes, where nodes represent concepts or entities, and edges represent relationships between them.

Tools: GraphDB

Ontologies:

Formal specifications of a shared conceptualization of a domain. Ontologies define classes, properties, and relationships between entities using a standardized language.

Tools: Protégé

Frames and Scripts:

Frames represent knowledge as structured records or templates consisting of slots (attributes) and fillers (values).

Scripts represent stereotypical sequences of events or actions in a particular domain.

Knowledge Graphs:

Graph-based representation of knowledge, where entities are represented as nodes, and relationships between entities are represented as edges.

Tools: Neo4j, Amazon Neptune, and Apache TinkerPop, RDFLib Python library

Probabilistic Models:

Represent uncertainty and probabilistic relationships between variables in a domain.

Tools: Bayesian networks, Markov random fields, and probabilistic graphical models.

Frame-based Systems:

Organize knowledge into frames, which are structured representations containing slots for properties and values.

Used in areas like natural language understanding, expert systems, and robotics.

Temporal Representation:

Represent knowledge that evolves over time, such as events, processes, or changes in states.

Temporal logics and temporal databases are used to represent and reason about temporal knowledge.

NOTE Comment in issue whatever research paper, tech and approach you found in structured way!

ELSE steps will be discuss in meet or next issue 🤩

Semantic Text Similarity: ( Word2Vec, GloVe, FastText) and Transformer-based models (e.g., BERT, RoBERTa, GPT) encode words or sentences into high-dimensional vectors, capturing their semantic relationships.

Bhushit-S commented 7 months ago

OCR Colab Link: https://colab.research.google.com/drive/1rd-xmSl8McVpjcvFT3Z7cz2XLVUTvqkf?usp=sharing

NME-rahul commented 7 months ago

ISSUE 1.1

OCR: @harshitkumardaga
- https://github.com/microsoft/unilm/tree/master/trocr
Data for OCR & Transformer: @Neelamsethia
Transformer: @Bhushit-S
Create & Train OCR model ARCHITECTURE(additionally): @NME-rahul

Neelamsethia commented 7 months ago

datasets for ocr and transformer https://www.kaggle.com/datasets/nibinv23/iam-handwriting-word-database/ (https://github.com/sushant097/Handwritten-Line-Text-Recognition-using-Deep-Learning-with-Tensorflow/tree/master) https://www.kaggle.com/datasets/naderabdalghani/iam-handwritten-forms-dataset/ this the BERT transformer model (https://www.kaggle.com/code/salehbinsuwaylih/bert-text-classification)

Bhushit-S commented 7 months ago

Transformers, That can be used:

BERT (Bidirectional Encoder Representations from Transformers): BERT is a widely-used transformer model that has shown strong performance in various NLP tasks. While it's primarily designed for contextualized word embeddings, it can also be fine-tuned for text recognition tasks.
Transformer based OCR: This model is specifically designed for Optical Character Recognition (OCR) tasks. It uses a transformer architecture to process images and extract text information. It's efficient and effective for recognizing text in images.
Tesseract with LSTM: Tesseract is an open-source OCR engine that has been enhanced with LSTM (Long Short-Term Memory) networks. It's widely used for OCR tasks and can recognize text from images with reasonable accuracy.
LayoutLM: LayoutLM is a transformer-based model designed for document image understanding tasks, including OCR. It considers the layout and spatial information of text in addition to textual content, making it suitable for recognizing text in documents and images.
ViT (Vision Transformer): While initially designed for computer vision tasks, ViT can also be adapted for text recognition tasks. It processes images in a patch-wise manner using transformer layers, making it a potential candidate for handwritten text recognition.
Orca 2: Orca 2 is built for research purposes only and provides a single turn response in tasks such as reasoning over user given data, reading comprehension, math problem solving and text summarization. (for maths)

Standard OCR with high accuracy: https://www.kaggle.com/code/gpiosenka/test-set-f1-score-99-efficientnetb3

COLAB link: https://colab.research.google.com/drive/1at6nyjlByJjf-7naHAKY9DyOko32IQX4#scrollTo=Sbqq3d2-PVZ9

https://programminghistorian.org/en/lessons/ocr-with-google-vision-and-tesseract#:~:text=Versatility%3A%20The%20tool%20performs%20well,OCR%20for%20handwritten%20documents%2Fimages

https://huggingface.co/microsoft/trocr-base-handwritten

https://paperswithcode.com/task/handwritten-text-recognition

https://paperswithcode.com/paper/trocr-transformer-based-optical-character

https://www.kaggle.com/datasets/preatcher/standard-ocr-dataset/code

NME-rahul commented 6 months ago

ISSUE 1.2

Ask mentor for data for OCR: @harshitkumardaga @Bhushit-S
Research for Grading system: @NME-rahul @Neelamsethia
labeled IAMDataset: @Neelamsethia
Data(OOPS) for Transformer: @NME-rahul

NME-rahul commented 6 months ago

Run TrOCR ipynb @harshitkumardaga

NME-rahul commented 6 months ago

ISSUE 1.2

Ask mentor for data for OCR: @harshitkumardaga @Bhushit-S

Research for Grading system: @NME-rahul @Neelamsethia

labeled IAMDataset: @Neelamsethia

Data(OOPS) for Transformer: @NME-rahul

Sample data for AGS in json format

NME-rahul commented 6 months ago

Approaches to find text similarities

Word2Vec(skip-grams)
Word Embeddings
Cosine similarities(measure)
Hamming distance(measure)
Pretrained language model(Bert(Encoder part of transformer), DistilBERT)

@Neelamsethia @harshitkumardaga @Bhushit-S

NME-rahul commented 6 months ago

@harshitkumardaga @Bhushit-S , as we know tesseract model is not working on images with handwritten symbols and characters, to make this possible try image pre-processing and perform operations given below,

Resize: Scale the image to the right size, at least 300 Dots Per Inch (DPI). Tesseract works best on images with a DPI of at least 300, and accuracy drops off below 10 pt x 300dpi.
Convert to grayscale: Convert the image to black and white by selecting a threshold value and classifying all the pixels above and below this value.
Blur: Reduce noise and blur the image.
Contrast enhance: Increase the appearance of large scale light-dark transitions.
Remove noise: Filter the image to remove noise pixels.
De-skewing: he text should appear horizontal and not tilted in any angle.

Refer:

Due Date: 15/03/2024

NME-rahul commented 6 months ago

Approaches to find text similarities

Word2Vec(skip-grams)

Word Embeddings

Cosine similarities

Hamming distance

Pretrained language model(Bert(Encoder part of transformer), DistilBERT)

@Neelamsethia @harshitkumardaga @Bhushit-S

https://colab.research.google.com/drive/12hL3Fi533bvRNBnZIFhDIcfYpnEcObW8?usp=sharing

NME-rahul commented 6 months ago

Problems in grading: model that does not solve

Marking Scheme if word2vec, embeddings model is used

|Question Type| Marks | |---|---| |Short|2| |Moderate|8| |Long|15|

$$Score = \frac{SA}{OA}$$

$$ScaledScore = Scale * Score$$

$OA$: No. of vector counts in original answer that should match.
$SA$: No. of vector counts in student's answer sheet that matched.
$Scale$: mark of a Question

Eg.

Short answer: Total vectors in original answers are 9 and Total vectors matched in student's answer with original answers are 5 and that is written for 8 marks question, then student will get similarity $\frac{5}{9} = 0.55$ and scaled score $8 * 0.55 = 4.95$
Long answer: Total vectors in original answers are 56 and Total vectors matched in student's answer with original answers are 43 and that is written for 15 marks question, then student will get similarity $\frac{23}{56} = 0.77$ and scaled score $15 * 0.77 = 11.58$

If any one only writes the terminologies and keywords of answer he/she will get marks: word2vec, embeddings
Grading will be done on different scales like some questions are 2 marks, 8marks and 15marks and this will require different range of answer: BERT, word2vec, embeddings
The same question can be write in multiple ways, meaning with different keywords and terminologies.: word2vec, embeddings

https://colab.research.google.com/drive/12hL3Fi533bvRNBnZIFhDIcfYpnEcObW8?usp=sharing

@Neelamsethia

harshitkumardaga commented 6 months ago

Please provide us with the /content/stop_words.txt used in the colaboratory https://colab.research.google.com/drive/12hL3Fi533bvRNBnZIFhDIcfYpnEcObW8?usp=sharing @NME-rahul

NME-rahul commented 6 months ago

Please provide us with the /content/stop_words.txt used in the colaboratory https://colab.research.google.com/drive/12hL3Fi533bvRNBnZIFhDIcfYpnEcObW8?usp=sharing @NME-rahul

stop_words.txt

NME-rahul / AI-AGS

Clone and push in github repository & Collect necessary information #1

1. Clone and push in github repository

ISSUE 1 is for everyone

FIND necessary information about the project

1. List the website where the dataset is available.

2. Find the best possible architecture for OCR.

3. Find he best possible architecture for knowledge representation.

ELSE steps will be discuss in meet or next issue 🤩

ISSUE 1.1

ISSUE 1.2

ISSUE 1.2

Approaches to find text similarities

Refer:

Approaches to find text similarities

Problems in grading: model that does not solve

Marking Scheme if word2vec, embeddings model is used