NME-rahul / AI-AGS

Other
2 stars 0 forks source link

Clone and push in github repository & Collect necessary information #1

Open NME-rahul opened 7 months ago

NME-rahul commented 7 months ago

1. Clone and push in github repository

  1. Fork the Repository: Go to the repository https://github.com/NME-rahul/AI-AGS on GitHub and click on the "Fork" button in the upper right corner. This creates a copy of the repository in your GitHub account.

  2. Set your github credentials

    git config user.email <your-github-mail-id>
    
    git config user.name <your-github-user-name>
  3. initialize .git directory

    git init
  4. Clone the repository in your local machine

    git clone https://github.com/your-username/AI-AGS.git
  5. Navigate to repository

    cd AI-AGS

  6. Create a new branch(use any branch name)

    git checkout -b <branch_name>
  7. To add in repo . represents all files i your current working directory and to commit specific file add file name instead of dot.

    git add .
  8. To commit(it will commit in your local machine not repo)

    git commit -m "Description of the changes made"
  9. Push to origin(your) github repository

    git push origin <branch_name>

NOTE: Here you will get error to resolve it install git-credential manger

  1. Add upstream url to sync with master branch

    git remote add upstream https://github.com/NME-rahul/AI-AGS.git
  2. Push into master branch

    git push upstream <branch_name>
  3. Create pull request in your pushed changes

TO GET DETAILED PROCESS FOLLOW: https://colab.research.google.com/drive/1ueRzYNn9r_eCivR-w4Lcj2v3pFFM_29U?usp=sharing

NOTE: If you don't want to know above process simply send the code file in whatsapp group, the above process is just for only seamless collaboration 😜😜

NME-rahul commented 7 months ago

ISSUE 1 is for everyone

FIND necessary information about the project

1. List the website where the dataset is available.

2. Find the best possible architecture for OCR.

  1. Find best OCR model on hugging face or any website
  2. Alternatively, create and train CRNN architecture if not found.

3. Find he best possible architecture for knowledge representation.

  1. Logic-based Representation:

    • First-order logic (FOL)
    • Description Logics (DL)
    • Rule-based systems
  2. Semantic Networks:

    • Represent knowledge as a network of interconnected nodes, where nodes represent concepts or entities, and edges represent relationships between them.
    • Tools: GraphDB
  3. Ontologies:

    • Formal specifications of a shared conceptualization of a domain. Ontologies define classes, properties, and relationships between entities using a standardized language.
    • Tools: Protégé
  4. Frames and Scripts:

    • Frames represent knowledge as structured records or templates consisting of slots (attributes) and fillers (values).
    • Scripts represent stereotypical sequences of events or actions in a particular domain.
  5. Knowledge Graphs:

    • Graph-based representation of knowledge, where entities are represented as nodes, and relationships between entities are represented as edges.
    • Tools: Neo4j, Amazon Neptune, and Apache TinkerPop, RDFLib Python library
  6. Probabilistic Models:

    • Represent uncertainty and probabilistic relationships between variables in a domain.
    • Tools: Bayesian networks, Markov random fields, and probabilistic graphical models.
  7. Frame-based Systems:

    • Organize knowledge into frames, which are structured representations containing slots for properties and values.
    • Used in areas like natural language understanding, expert systems, and robotics.
  8. Temporal Representation:
  • Represent knowledge that evolves over time, such as events, processes, or changes in states.
  • Temporal logics and temporal databases are used to represent and reason about temporal knowledge.

NOTE Comment in issue whatever research paper, tech and approach you found in structured way!

ELSE steps will be discuss in meet or next issue 🤩

Semantic Text Similarity: ( Word2Vec, GloVe, FastText) and Transformer-based models (e.g., BERT, RoBERTa, GPT) encode words or sentences into high-dimensional vectors, capturing their semantic relationships.

Bhushit-S commented 7 months ago

OCR Colab Link: https://colab.research.google.com/drive/1rd-xmSl8McVpjcvFT3Z7cz2XLVUTvqkf?usp=sharing

NME-rahul commented 7 months ago

ISSUE 1.1

  1. OCR: @harshitkumardaga
  2. Data for OCR & Transformer: @Neelamsethia
  3. Transformer: @Bhushit-S
  4. Create & Train OCR model ARCHITECTURE(additionally): @NME-rahul
Neelamsethia commented 7 months ago

datasets for ocr and transformer https://www.kaggle.com/datasets/nibinv23/iam-handwriting-word-database/ (https://github.com/sushant097/Handwritten-Line-Text-Recognition-using-Deep-Learning-with-Tensorflow/tree/master) https://www.kaggle.com/datasets/naderabdalghani/iam-handwritten-forms-dataset/ this the BERT transformer model (https://www.kaggle.com/code/salehbinsuwaylih/bert-text-classification)

Bhushit-S commented 7 months ago

Transformers, That can be used:

  1. BERT (Bidirectional Encoder Representations from Transformers): BERT is a widely-used transformer model that has shown strong performance in various NLP tasks. While it's primarily designed for contextualized word embeddings, it can also be fine-tuned for text recognition tasks.

  2. Transformer based OCR: This model is specifically designed for Optical Character Recognition (OCR) tasks. It uses a transformer architecture to process images and extract text information. It's efficient and effective for recognizing text in images.

  3. Tesseract with LSTM: Tesseract is an open-source OCR engine that has been enhanced with LSTM (Long Short-Term Memory) networks. It's widely used for OCR tasks and can recognize text from images with reasonable accuracy.

  4. LayoutLM: LayoutLM is a transformer-based model designed for document image understanding tasks, including OCR. It considers the layout and spatial information of text in addition to textual content, making it suitable for recognizing text in documents and images.

  5. ViT (Vision Transformer): While initially designed for computer vision tasks, ViT can also be adapted for text recognition tasks. It processes images in a patch-wise manner using transformer layers, making it a potential candidate for handwritten text recognition.

  6. Orca 2: Orca 2 is built for research purposes only and provides a single turn response in tasks such as reasoning over user given data, reading comprehension, math problem solving and text summarization. (for maths)

Standard OCR with high accuracy: https://www.kaggle.com/code/gpiosenka/test-set-f1-score-99-efficientnetb3

COLAB link: https://colab.research.google.com/drive/1at6nyjlByJjf-7naHAKY9DyOko32IQX4#scrollTo=Sbqq3d2-PVZ9

https://programminghistorian.org/en/lessons/ocr-with-google-vision-and-tesseract#:~:text=Versatility%3A%20The%20tool%20performs%20well,OCR%20for%20handwritten%20documents%2Fimages

https://huggingface.co/microsoft/trocr-base-handwritten

https://paperswithcode.com/task/handwritten-text-recognition

https://paperswithcode.com/paper/trocr-transformer-based-optical-character

https://www.kaggle.com/datasets/preatcher/standard-ocr-dataset/code

NME-rahul commented 6 months ago

ISSUE 1.2

NME-rahul commented 6 months ago

Run TrOCR ipynb @harshitkumardaga

NME-rahul commented 6 months ago

ISSUE 1.2

  • Ask mentor for data for OCR: @harshitkumardaga @Bhushit-S
  • Research for Grading system: @NME-rahul @Neelamsethia
  • labeled IAMDataset: @Neelamsethia
  • Data(OOPS) for Transformer: @NME-rahul

Sample data for AGS in json format

NME-rahul commented 6 months ago

Approaches to find text similarities

  1. Word2Vec(skip-grams)
  2. Word Embeddings
  3. Cosine similarities(measure)
  4. Hamming distance(measure)
  5. Pretrained language model(Bert(Encoder part of transformer), DistilBERT)

@Neelamsethia @harshitkumardaga @Bhushit-S

NME-rahul commented 6 months ago

@harshitkumardaga @Bhushit-S , as we know tesseract model is not working on images with handwritten symbols and characters, to make this possible try image pre-processing and perform operations given below,

Refer:

Due Date: 15/03/2024

NME-rahul commented 6 months ago

Approaches to find text similarities

  1. Word2Vec(skip-grams)
  2. Word Embeddings
  3. Cosine similarities
  4. Hamming distance
  5. Pretrained language model(Bert(Encoder part of transformer), DistilBERT)

@Neelamsethia @harshitkumardaga @Bhushit-S

https://colab.research.google.com/drive/12hL3Fi533bvRNBnZIFhDIcfYpnEcObW8?usp=sharing

NME-rahul commented 6 months ago

Problems in grading: model that does not solve

Marking Scheme if word2vec, embeddings model is used

|Question Type| Marks | |---|---| |Short|2| |Moderate|8| |Long|15|

$$Score = \frac{SA}{OA}$$

$$ScaledScore = Scale * Score$$

Eg.


  1. If any one only writes the terminologies and keywords of answer he/she will get marks: word2vec, embeddings
  2. Grading will be done on different scales like some questions are 2 marks, 8marks and 15marks and this will require different range of answer: BERT, word2vec, embeddings
  3. The same question can be write in multiple ways, meaning with different keywords and terminologies.: word2vec, embeddings

https://colab.research.google.com/drive/12hL3Fi533bvRNBnZIFhDIcfYpnEcObW8?usp=sharing

@Neelamsethia

harshitkumardaga commented 6 months ago

Please provide us with the /content/stop_words.txt used in the colaboratory https://colab.research.google.com/drive/12hL3Fi533bvRNBnZIFhDIcfYpnEcObW8?usp=sharing @NME-rahul

NME-rahul commented 6 months ago

Please provide us with the /content/stop_words.txt used in the colaboratory https://colab.research.google.com/drive/12hL3Fi533bvRNBnZIFhDIcfYpnEcObW8?usp=sharing @NME-rahul

stop_words.txt