daily-coding-problem / chatgpt-scraper

A Selenium-based ChatGPT interaction automation tool. This script initializes a browser session, interacts with ChatGPT using predefined prompts, and facilitates automated conversations with ChatGPT. Ideal for fetching responses and conducting tests or demonstrations.
2 stars 0 forks source link

ChatGPT Scraper

Docker Linux Python Selenium

A Selenium-based ChatGPT interaction automation tool. This script initializes a browser session, interacts with ChatGPT using predefined prompts, and facilitates automated conversations with ChatGPT. Ideal for fetching responses and conducting tests or demonstrations.

Table of Contents

Features

Prerequisites

Before you begin, ensure you have met the following requirements:

Installation

Clone the Repository

git clone --recurse-submodules https://github.com/daily-coding-problem/chatgpt-scraper.git
cd chatgpt-scraper

Setup Python Environment

Use the following commands to set up the Python environment if you do not want to use Docker:

python -m venv .venv
source .venv/bin/activate
pip install poetry
poetry install --no-root

Setup Docker

If you would like to use Docker, ensure Docker and Docker Compose are installed on your machine. If not, follow the installation guides for Docker and Docker Compose.

Build Docker Images

docker compose build

Configuration

Environment Variables

Create a .env file in the project root containing the content from .env.example. Modify the values as needed.

Configuring TEST_ACCOUNTS

The TEST_ACCOUNTS environment variable is used to securely store and pass credentials for test accounts to the ChatGPT scraper. These credentials need to be formatted as a base64-encoded JSON structure.

You can use the Accounts Serializer tool to generate this JSON structure and encode it.

Steps to Configure TEST_ACCOUNTS:

  1. Clone the Accounts Serializer Repository

    git clone https://github.com/daily-coding-problem/accounts-serializer.git
    cd accounts-serializer
  2. Install Dependencies

    Ensure you have Python 3.11 or higher and Poetry installed. Then run:

    poetry install
  3. Generate the JSON Structure

    Run the accounts_serializer.py script with your account details:

    poetry run python accounts_serializer.py \
       --emails test@company.com user@anothercompany.com \
       --passwords password123 userpassword456 \
       --providers basic google \
       --secrets google:google-secret-abc chatgpt:chatgpt-secret-xyz github:github-secret-123 aws:aws-secret-789

    This command will output a JSON structure like the following:

    {
       "test@company.com": {
           "provider": "basic",
           "password": "password123",
           "secret": {
               "google": "google-secret-abc",
               "chatgpt": "chatgpt-secret-xyz"
           }
       },
       "user@anothercompany.com": {
           "provider": "google",
           "password": "userpassword456",
           "secret": {
               "github": "github-secret-123",
               "aws": "aws-secret-789"
           }
       }
    }
  4. Base64 Encode the JSON Structure

    Use a tool or script to base64 encode the JSON structure:

    echo -n '{"test@company.com": {"provider": "basic", "password": "password123", "secret": {"google": "google-secret-abc", "chatgpt": "chatgpt-secret-xyz"}}, "user@anothercompany.com": {"provider": "basic", "password": "userpassword456", "secret": {"github": "github-secret-123", "aws": "aws-secret-789"}}}' | base64
  5. Set the TEST_ACCOUNTS Environment Variable

    Copy the base64-encoded string and set it as the value of the TEST_ACCOUNTS environment variable in your .env file or directly in your shell environment.

    export TEST_ACCOUNTS="base64_encoded_json_structure"

    Now, TEST_ACCOUNTS is configured and ready to be used by the ChatGPT scraper.

Target a Specific ChatGPT Account

If you want to target a specific account, you can set the CHATGPT_ACCOUNT environment variable with the email of the account you want to use.

   export CHATGPT_ACCOUNT="some-email@company.com"

The email should be one of the emails in the TEST_ACCOUNTS JSON structure.

Use Temporary Chat Mode

If you want to use the Temporary Chat mode, set the TEMPORARY_CHAT environment variable to true.

   export TEMPORARY_CHAT="true"

If set to true, this will toggle the temporary chat mode in ChatGPT's interface and not store any chat history.

Configure Headless Mode

You can set the CHATGPT_HEADLESS environment variable to true to run the scraper in headless mode.

   export CHATGPT_HEADLESS="true"

If set to true, the scraper will run in headless mode, which means the browser will not be visible during the scraping process.

Set System Prompt

You can set the SYSTEM_PROMPT environment variable to a custom system prompt that will be used in the conversation with ChatGPT.

   export CHATGPT_SYSTEM_PROMPT="Hello, I am a system prompt."

Set User Prompts

You can set the USER_PROMPTS environment variable to a list of user prompts that will be used in the conversation with ChatGPT.

   export CHATGPT_USER_PROMPTS="How are you doing today?" "What is your favorite color?"

If you do not set the USER_PROMPTS environment variable or do not pass --user-prompts with a valid value, the scraper will complain and exit.

Configure Log Level

You can set the log level for the scraper by setting the LOG_LEVEL environment variable. The default log level is INFO.

   export LOG_LEVEL="DEBUG"

The available log levels are DEBUG, INFO, WARNING, ERROR, and CRITICAL.

Usage

Run the scraper with the specified plans:

docker compose run chatgpt-scraper

Or without Docker:

poetry run python main.py

License

This project is licensed under the MIT License - see the LICENSE file for details.