LucknowAI / Lucknow-LLM

Collecting data for Building Lucknow's first LLM
17 stars 27 forks source link
gemma india llama2 llama2-7b llm llm-finetuning lucknow mistral-7b

lucknow LLM

Lucknow LLM

Promptify is released under the Apache 2.0 license. http://makeapullrequest.com Community colab

Installation

With pip

You should install lucknowllm using Pip command

pip3 install git+https://github.com/LucknowAI/Lucknow-LLM.git

Quick tour of LucknowLLM Framework

How to get prompt

from lucknowllm import construct_prompt
construct_prompt('raw_to_structured', "Hello world, This is input sentence")

How to convert raw data into structured data

from lucknowllm import get_prompt
from lucknowllm import GeminiModel

Gemini = GeminiModel(api_key = "", model_name = "gemini-1.0-pro")

llm_input    = construct_prompt('raw_to_structured', "Lucknow is the capital and the largest city of the Indian state of Uttar Pradesh and it is the administrative headquarters of the eponymous district and division.")
model_output = Gemini.generate_content(llm_input)

How to segment long Paragraphs into smaller ones for the model input

from lucknowllm import split_into_segments

sentence = "Lucknow is the capital and the largest city of the Indian state of Uttar Pradesh and it is the administrative headquarters of the eponymous district and division."
split_into_segments(sentence)

How to load Unstructured Data

from lucknowllm import UnstructuredDataLoader
loader = UnstructuredDataLoader()

# if you want to load all files in one folder
loader.get_data('Cultural_Festival_of_Lucknow')

#if you want to load specific file
loader.get_data('Cultural_Festival_of_Lucknow', 'Lucknow_Mahotsav.txt')

How to load structured Data

from lucknowllm import StructuredDataLoader
loader = StructuredDataLoader()

# if you want to load all files in one folder
loader.get_data('Arts_and_Crafts')

#if you want to load specific file
loader.get_data('Arts_and_Crafts', 'Arts_and_Craft.json')

Contributing to LucknowLLM

You can contribute in the following ways:

Collect Unstructured Data

Choose a topic from the list below and search for content related to Lucknow on the internet, books, newspapers, or wherever you can find relevant information. If the topic is not listed, you can create a new topic and contribute.

Contribute to the LucknowLLM Framework

You can help build the LucknowLLM framework by working on the data preprocessing pipeline, collecting data automatically using Selenium, website scrapers, etc. Write scripts for these tasks and contribute them, and provide tutorials for those scripts in the tutorial folder to make them easy to understand and use.

Review Dataset Quality

You can contribute as a reviewer to ensure the quality of the dataset. Go through existing datasets in lucknowllm/data/Unstructured_data and check their quality. If you find any biased, aggressive, religiously or politically biased, or sensitive information, you can remove it. This is also a valuable contribution to maintaining the quality of the datasets.

Contribute to Documentation

You can contribute to the documentation of the LucknowLLM framework. We can have a website like lucknowllm.readthedocs.io where users can understand how to use the framework. Check out https://about.readthedocs.com/?ref=readthedocs.com for more information.

Improve the Lucknow AI Website

You can also work on improving the content of the Lucknow AI website. Check out https://github.com/LucknowAI/lucknowai.github.io for more details.

How to Start? I'm New to GitHub

Collecting data for Building Lucknow's first LLM

We are planning to collect these categories of data (not limited to):

Websites to scrape

  • Wikipedia Lucknow
  • https://uptourism.gov.in/
  • Hall-Of-Fame | Top Contributors

    Made with contrib.rocks.