6677-ai / tap4-ai-crawler

The crawler opened source by tap4.ai
https://tap4.ai
MIT License
166 stars 119 forks source link
aitoolkit aitools crawler crawler-engine crawler-python

Tap4 AI Crawler

Tap4 AI Crawler is an open source web crawler built by tap4.ai, that will convert the website into the website summarize info with LLM. Includes powerful scraping, crawling and data extraction capabilities, web page screenshots. With Tap4 AI Crawler, you can not only easily update the ai tool detail for your AI Tools Directory but also summary of the website.

This project is based on Python, very lightweight, easy to maintain, suitable for individual developers interested in AI tools directories, and also for learners interested in Python. We welcome everyone to fork and star.

English | 简体中文

Support Tap4 AI in Product Hunt

AI Tools Directory by Tap4 AI - Open-source AI navigation & discovery with multi-language | Product Hunt

Features

tai4-ai

Follow and Support Links

Please follow our Twitter: https://x.com/tap4ai

If you find the project helpful, please consider buying me a coffee:

Buy Me A Coffee

If you are interested in the project, please add my WeChat: helloleo2023, note: "tap4 ai open source", or scan the QR code: tap4-ai-wx

Quick Start

[
  {
    "AllowedOrigins": [
      "*"
    ],
    "AllowedMethods": [
      "GET",
      "POST",
      "PUT",
      "DELETE",
      "HEAD"
    ],
    "AllowedHeaders": [
      "*"
    ]
  }
]

(2)Deploying in Zeabur based on code mode

Deploying the fork github repository in Zeabur, and configuring environment variables in Zeabur or manually modifying the .env file in the code repository. The environment variables are as follows:

Runs on local

Install

Setup

(1) Clone this project

git clone https://github.com/6677-ai/tap4-ai-crawler.git

(2) Apply for llama3 key on Groq

Groq key apply

(3) Apply for S3 object storage information

(4) Set environment variables

## LLM Configuration: Large model related configuration
GROQ_API_KEY=gsk_********

## Object Storage Configuration: Storage related configuration
S3_ENDPOINT_URL=https://*****.r2.cloudflarestorage.com
S3_BUCKET_NAME=tap4ai
S3_ACCESS_KEY_ID=****
S3_SECRET_ACCESS_KEY=****
S3_CUSTOM_DOMAIN=****
AUTH_SECRET=****

(5) Run locally

Install Python dependencies

pip install -r requirements.txt

Run

python main_api.py

After running, a RestAPI will be exposed, access URL suffix: /site/crawl

How to request the API

Use curl to verify the API with POST request. Request params:

curl -X POST -H "Content-Type: application/json" -H "Authorization: Bearer xxxxx" -d '{"url": "https://tap4.ai", "tags": [ "selected tags: ai-detector","chatbot","text-writing","image","code-it"]}' http://127.0.0.1:8040/site/crawl

Response Params:

{
    "code": 200,
    "data": {
        "description": "Tap4 AI Directory is a tool provides free AI Tools Directory. Get your favorite AI tools with Tap4 AI Directory, Tap4 AI Directory aims to collect all the AI tools and provide the best for users.",
        "detail": "### What is Tap4 AI?\n\nTap4 AI is an AI-driven platform that provides access to a vast array of AI technologies for various needs, including ChatGPT, GPT-4o for text generation and image understanding, Dalle3 for image creation, and document analysis.\n\n### How to Use Tap4 AI\n\nEvery user can utilize GPT-4o for free up to 20 times a day on tap4.ai. Subscribing to the platform grants additional benefits and extended access beyond the free usage limits.\n\n### Features of Tap4 AI\n\n#### Can I Generate Images Using Tap4 AI?\n\nYes, with Dalle3's text-to-image generation capability, users can create images, sharing credits with GPT-4o for a seamless creative experience.\n\n#### How Many GPTs are Available on Tap4 AI?\n\nTap4.ai offers nearly 200,000 GPT models for a wide variety of applications in work, study, and everyday life. You can freely use these GPTs without the need for a ChatGPT Plus subscription.\n\n#### How Can I Maximize My Use of Tap4 AI's AI Services?\n\nBy leveraging the daily free uses of GPT-4o document reading, and Dalle's image generation, users can explore a vast range of AI-powered tools to support various tasks.\n\n#### Will My Information Be Used for Your Training Data?\n\nWe highly value user privacy, and your data will not be used for any training purposes. If needed, you can delete your account at any time, and all your data will be removed as well.\n\n#### When Would I Need a Tap4 AI Subscription?\n\nIf the 20 free GPT-4o conversations per day do not meet your needs and you heavily rely on GPT-4o, we invite you to subscribe to our affordable products.",
        "languages": [],
        "screenshot_data": "https://demo.tap4.cn/tools/2024/6/15/tap4-ai-1718447471.png",
        "screenshot_thumbnail_data": "https://demo.tap4.cn/tools/2024/6/15/tap4-ai-thumbnail-1718447477.png",
        "tags": ["code-it","text-writing"],
        "title": "Get your best AI Tools | Tap4 AI Directory",
        "url": "https://tap4.ai"
    },
    "msg": "success"
}

FAQ

Link products

TAP4-AI-Directory

The Collection for the AI tools all over the world. | Collect free ChatGPT mirrors, alternatives, prompts, other AI tools, etc. For more, please visit: Tap4 AI

How to get your first users for startup at the website list

Here is the website list for submitting your product to get users. Please visit StartUp Your Product List

Free Stable Diffusion 3 Online Tool

Free Stable Diffusion 3 Online

Free Tiny Png Tool

Free Type Png Tool

Free GPT2 Output Detector

Free GPT2 Output Detector

The Tattoo AI Generator and Design

Tattoo AI Design is a tattoo AI generator and design tool for tattoo fans. If you are interested, visit Tattoo AI Design