Vignana-Jyothi / kp-gen-ai

MIT License
0 stars 0 forks source link

[Theory] Google Colab #12

Open head-iie-vnr opened 2 days ago

head-iie-vnr commented 2 days ago

Google Colab (short for "Colaboratory") is a free cloud-based platform provided by Google that allows users to write and execute Python code in a Jupyter notebook environment. It is particularly popular among data scientists, machine learning practitioners, and educators for several reasons:

  1. Cloud-Based: Since it is hosted on the cloud, you do not need to install any software on your local machine. You can access your notebooks from any device with an internet connection.

  2. Free Access to GPUs and TPUs: Google Colab offers free access to powerful hardware accelerators like GPUs (Graphics Processing Units) and TPUs (Tensor Processing Units), which significantly speed up the execution of complex computations, especially those involved in deep learning and large-scale data processing.

  3. Jupyter Notebook Interface: The interface is similar to Jupyter Notebooks, which means you can write and run Python code in cells, visualize data with plots, and integrate Markdown for documentation.

  4. Integration with Google Drive: You can easily save and manage your notebooks in your Google Drive, enabling seamless collaboration and sharing.

  5. Pre-Installed Libraries: Google Colab comes pre-installed with many popular Python libraries for data science and machine learning, such as TensorFlow, Keras, PyTorch, Pandas, NumPy, and many more.

  6. Collaboration: Multiple users can work on the same notebook simultaneously, making it an excellent tool for collaborative projects and teaching.

  7. Easy to Share: You can share your notebooks with others via a simple link, and they can view or even edit the notebook depending on the permissions you set.

Google Colab is widely used for tasks such as prototyping machine learning models, conducting exploratory data analysis, and teaching programming and data science concepts.

head-iie-vnr commented 2 days ago

Google Colab has certain quota limits to ensure fair usage and resource availability for all users. These limits can vary, especially for free users, and they may change over time. Here are some of the key quota limits:

Free Tier:

  1. Runtime Duration:

    • The maximum continuous usage of a single session is typically 12 hours. After this, the session will disconnect automatically.
    • Idle sessions (sessions where there is no code execution or user interaction) can be disconnected after 30 minutes of inactivity.
  2. Daily Usage Limits:

    • The free tier has usage limits on the total amount of time you can use GPUs and TPUs. Exact limits are not publicly specified but can vary based on resource availability and usage patterns.
  3. Hardware Specifications:

    • The type of GPU provided can vary (e.g., NVIDIA K80s, T4s, P4s, and P100s). The specific GPU available to you can change dynamically based on demand.
    • The free tier generally provides 12 GB of RAM, but this can sometimes increase to 25 GB if the notebook requires more memory.
  4. Disk Space:

    • Free users typically get around 100 GB of disk space, but this is a temporary storage that gets reset when the session ends or is recycled.

Colab Pro and Colab Pro+:

Google offers paid plans called Colab Pro and Colab Pro+ that provide additional resources and higher limits:

  1. Colab Pro:

    • Priority access to premium GPUs like the NVIDIA T4s and P100s.
    • Longer maximum runtime (up to 24 hours).
    • Increased usage limits for GPUs.
    • More RAM (up to 32 GB).
  2. Colab Pro+:

    • Access to the best available GPUs, including NVIDIA V100s.
    • Even longer runtimes and more generous usage limits.
    • Even more RAM (up to 52 GB).

General Considerations:

It's important to be aware that these limits are subject to change, and it's a good idea to check the Google Colab FAQ or the Google Colab Pro page for the latest information.

head-iie-vnr commented 2 days ago

Working with Large Language Models (LLMs) in Python involves using various libraries that facilitate model implementation, data handling, and evaluation. Here are some of the most popular libraries:

1. Transformers

2. TensorFlow

3. PyTorch

4. Datasets

5. Numpy

6. Pandas

7. Matplotlib and Seaborn

8. scikit-learn

9. NLTK and spaCy

These libraries provide a comprehensive toolkit for working with large language models and performing a wide range of machine learning and data processing tasks.

head-iie-vnr commented 2 days ago

https://colab.research.google.com/drive/1wrdxxv1aczuFdjRpHREKxo8dx1LPfM3a#scrollTo=OoGw2U0696Ip

!pip install transformers torch

# Import necessary libraries
from transformers import pipeline

# Initialize the sentiment analysis pipeline with a specific model
senti_analysis = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")

# Define the list of texts
texts = [
    "I do not like to eat cake, but I like the smell",
    "I like cake smell, but I do not like to eat it",
    "I like the cake",
    "I do not like the smell"
]

# Perform sentiment analysis on the texts
results = senti_analysis(texts)

# Print the results
for text, result in zip(texts, results):
    print(f"Text: {text}\nSentiment: {result['label']}, Score: {result['score']:.4f}\n")

By specifying both the task and the model, you ensure that the pipeline is configured correctly to provide reliable sentiment analysis results for your input texts.

head-iie-vnr commented 2 days ago

The pipeline function for initializing the sentiment analysis

Parameters:

  1. "sentiment-analysis"
  2. model="distilbert-base-uncased-finetuned-sst-2-english"

Detailed Explanation:

  1. "sentiment-analysis"

    • Description: This is the name of the task you want to perform. In this case, it specifies that you want to use the pipeline for sentiment analysis.
    • Function: The pipeline function supports various tasks such as "text-generation", "text-classification", "question-answering", etc. By specifying "sentiment-analysis", you are telling the pipeline to use a model that can classify the sentiment of a given text as positive, negative, or neutral.
  2. model="distilbert-base-uncased-finetuned-sst-2-english"

    • Description: This parameter specifies the pre-trained model to use for the sentiment analysis task.
    • Function:
      • Model Name: "distilbert-base-uncased-finetuned-sst-2-english" is a specific model hosted on Hugging Face's Model Hub. It is a fine-tuned version of the DistilBERT model, which is a smaller and faster variant of BERT (Bidirectional Encoder Representations from Transformers).
      • Pre-trained Model: The model has been pre-trained on a large dataset and fine-tuned specifically for the SST-2 (Stanford Sentiment Treebank) dataset, which makes it well-suited for sentiment analysis tasks in English.
      • Usage: By specifying this model, you ensure that the pipeline uses the appropriate pre-trained weights and architecture for the sentiment analysis task, providing more accurate results than using a generic or unspecified model.

Example:

The specified model and task work together to analyze the sentiment of input texts. Here is how the pipeline utilizes these parameters:

head-iie-vnr commented 2 days ago

Output generated

Text: I do not like to eat cake, but I like the smell Sentiment: POSITIVE, Score: 0.9992

Text: I like cake smell, but I do not like to eat it Sentiment: POSITIVE, Score: 0.5582

Text: I like the cake Sentiment: POSITIVE, Score: 0.9997

Text: I do not like the smell Sentiment: NEGATIVE, Score: 0.9969

head-iie-vnr commented 2 days ago

Understand details about model

https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english

The above page shows more details about the model. It contains details like

Discover models by task type

https://huggingface.co/models?pipeline_tag=text-generation&sort=trending

sentiment-analysis : is under Task type 'Text-Classification'