AstraBert / jan-scraper

jan-scraper: interact with Jan.ai by sending messages and retrieving the response
https://pypi.org/project/jan-scraper/
GNU Affero General Public License v3.0
5 stars 0 forks source link
ai auto bot chatbot end-user-app generative-ai integration jan-ai large-language-models llm

jan-scraper

jan-scraper: interact with Jan.ai by sending messages and retrieving the response

⚠️DISCLAIMER: This version is still a beta and it is built for small, end-user, customizable projects. The implementation of API scraping brings us closer to the result of optimized scaling for large LLM application in daily life, but we're still far from what we can reach... Stay tuned!

🎉jan-scraper for conversation??: Now jan-scraper is optimized also to use Jan as an interface to hold a conversation with several text-generation and text2text-generation HuggingFace models, in 89 different languages, with your own pdfs.

⚠️Being a new implementation, the conversator module may still be unstable, throw errors and have some bugs. Moreover, it only support one pdf at a time, so, if you have more, make sure to concatenate all of them in only one file.

Overview

jan-scraper is a Python package that provides a convenient interface to interact with Jan.ai. Jan.ai is an open-source desktop app designed to run large language models (LLMs) locally, ensuring an offline and privacy-focused environment. With jan-scraper, you can easily send messages to Jan and retrieve responses, making it a versatile tool for leveraging Jan's capabilities programmatically.

Installation

python3 -m pip install jan-scraper

Requirements

Functions

scraper.get_directory_info(path)

Get the last modified time of a folder.

scraper.define_assistant(json_file_path, new_instructions, model, name="Jan", description="A default assistant that can use all downloaded models")

Update the assistant's configuration in a JSON file.

scraper.parse_jsonl_file(file_path)

Parse a JSON Lines file and return a list of JSON objects.

scraper.get_package_location()

Get the location of the installed jan-scraper package.

scraper.scrape_jan(text, app, jan_threads_path, model, new_instructions="You are a helpful assistant", name="Jan", description="A default assistant that can use all downloaded models", set_new_thread=True)

Scrape data using the jan-scraper package.

scraper.activate_jan_api

This function automates the activation of Jan application through a series of GUI interactions using the pyautogui library. Here's a step-by-step explanation:

scraper.convert_stream_to_jsonl(stream)

Convert a text stream from Jan API containing JSON lines into a JSON Lines (.jsonl) file.

This function reads the provided text stream file, removes unnecessary lines, and writes the cleaned content into a new JSON Lines file. The resulting file can be used for further processing and analysis of Jan API responses.

scraper.mine_content_from_jsonl(jsonlfile)

Extract relevant content from a JSON Lines (.jsonl) file obtained from Jan API responses.

This function parses the JSON Lines file, extracts the desired content from the API response, and returns it as a string. The extracted content is typically relevant information obtained from scraping the Jan API, which can be further processed or displayed as needed.

scraper.scrape_jan_through_api:

This function uses the previously defined activate_jan_api function and interacts with the API related to the Jan application, to obtain responses to user inputs.

You can initialize the model you want to exploit and activate Jan API in your app doing the following:

  1. Settings > Models > Your-favourite-model > ... > Start Model
  2. Local API server > Choose model to start > Your-favourite-model > Start server

From version 0.0.4b0, we decided to deprecate the auto parameter. You can, nevertheless, call a function named scraper.activate_jan_api to speed up the process of API activation.

formatter.convert_code_to_curl_json

Convert a Python code string to a format suitable for inclusion in a JSON string within a curl command.

Parameters

Returns

Description This function takes a Python code string as input and escapes backslashes and double quotes within the code to prepare it for inclusion in a JSON string within a curl command. It also replaces newline characters with '\n' to ensure proper formatting in the JSON representation.

conversator.generate_id()

Generate a random 26-character alphanumeric ID.

Returns

Description This function generates a random alphanumeric ID with a length of 26 characters. It includes a mix of digits and uppercase letters, making it suitable for unique identifiers.


conversator.create_a_persistent_db(pdfpath)

Create a persistent database from a PDF file.

Parameters

Description This function initiates the creation of a persistent database from a PDF file. It involves loading the PDF, splitting documents into smaller chunks, using HuggingFace embeddings to transform text into numerical vectors, and storing the processed data in a Chroma vector store. The time taken for the operation is printed to the standard error output.

A cache for the embeddings that will be used by your language model will be created in the same directory as your pdf, in a folder named documenttitle_cache (if you have a pdf whose path is "/Users/User/mydata/chat.pdf", the vector store will be: "/Users/User/mydata/chat_cache").

A local vectore store will be created in the same directory as the provided pdf, in a folder named documenttitle_localDB (if you have a pdf whose path is "/Users/User/mydata/chat.pdf", the vector store will be: "/Users/User/mydata/chat_localDB").

conversator.jan_chatting(jan_app_path, jan_data_folder, thread_id, hfmodel, model_task, persistent_db_dir, embeddings_cache, pdfpath)

Implement a chat system using the Jan app, Hugging Face models, and a persistent database.

Parameters

Raises

Description This function facilitates interaction with the Jan app, utilizes Hugging Face models, and manages a persistent database. It launches Jan, reads and processes chat messages from a JSON file, queries a conversational retrieval chain, translates responses, and updates the chat thread. The function is designed to handle interruptions with a graceful exit.

models_source.longest_in_list(l)

Find and return the longest element in a list.

Parameters

Returns

Description This function takes a list of elements as input and identifies the longest element within it. The result is the element with the maximum length.

models_source.choose_right_model(model_name, model_task)

Choose the right Hugging Face model based on the provided model name and task.

Parameters

Returns

Raises

Description This function selects the appropriate Hugging Face model by analyzing the model name and task. It supports two tasks: "text2text-generation" and "text-generation." Depending on the task, it matches keywords in the model name and returns the most suitable model. If multiple matches are found, it chooses the one with the longest keyword.

models_source.supported_causalLM_models()

Print a list of supported causal language models.

Description This function prints a list of supported causal language models.

models_source.supported_seq2seqLM_models()

Print a list of supported sequence-to-sequence language models.

Description This function prints a list of supported sequence-to-sequence language models.

anylang.supported_languages()

Print a list of supported languages.

Description This function prints a list of supported languages based on the keys in the LANGNAMES dictionary.

anylang.TranslateFunctions

A class for translating text between languages using Google Translate.

Attributes

Methods

Raises

Description

The TranslateFunctions class encapsulates functionality for translating text between languages using Google Translate. It initializes with a text and a destination language, and automatically detects the source language (or defaults to "auto"). The translatef method performs the translation, and the class raises a warning if the provided language is not recognized for auto-detection.

Usage Example

translator = TranslateFunctions("Hello, world!", destination="es")
translation = translator.translatef()

anylang.TranslateFunctions.__init__(text, destination)

Initialize the TranslateFunctions object.

Parameters

Raises

Description This method initializes a TranslateFunctions object with a given text and destination language. It attempts to detect the source language; if unsuccessful, it defaults to "auto" and raises a warning.

anylang.TranslateFunctions.translatef()

Translate the text to the target language.

Returns

Description This method utilizes Google Translate to translate the stored text to the specified destination language. The translated text is returned as a string.

Usage Example

translator = TranslateFunctions("Hello, world!", destination="es")
translation = translator.translatef()
print(translation)  # Output: ¡Hola Mundo!

Example

import jan_scraper.scraper

# Define your messages, app path, and other necessary parameters
text = "Hi there, can you present yourself?"
app_path = "/path/to/jan-app"
threads_path = "/path/to/jan-threads"
model = "your-preferred-model"
instructions = "You are an Italian XVII century poet"
name = "Guglielmo Scuotipera"

# Scrape Jan.ai and retrieve the response
response = jan_scraper.scraper.scrape_jan(text = text, app = app_path, jan_threads_path = threads_path, model = model, new_instructions = instructions, name = name)

# Process the response as needed
print("Jan's Response:", response)

# Wanna speed up Jan opening and API activation? Try the following code!
jan_scraper.scraper.activate_jan_api(app_path)

# 1. Open Jan
# 1. Settings > Models > Your-favourite-model > ... > Start Model
# 2. Local API server > Choose model to start > Your-favourite-model > Start server
# 4. Scrape Jan API with the following function 
response = jan_scraper.scraper.scrape_jan_through_api(model="tinyllama-1.1b", text="How is it to be ruling on such a big Empire?", name="Carolus Magnus", new_instructions="You are an emperor from the Middle Ages")

print("Jan's Response:", response)

# Do you want to use your own HF model with your own pdf? Do something like this!
create_a_persistent_db("mydata/chat.pdf") # Creates a local vectorestore database at mydata/chat_localDB and a local embeddings cache at mydata/chat_cache
jan_chatting(jan_app_path="Jan.exe",jan_data_folder="Users/User/jan",thread_id="jan_1706919400",hfmodel="google/flan-t5-base",model_task="text2text-generation",persistent_db_dir="mydata/chat_localDB",embeddings_cache="mydata/chat_cache",pdfpath="mydata/chat.pdf")

Find more elaborate user cases in user_case_noAPI.py and in user_case_API.py. Make sure also not to miss the Discord bot application user cases!🐍

License

This project is licensed under the AGPL-v3.0 License - see the LICENSE file for details.

Acknowledgments