BuilderIO / gpt-crawler

Crawl a site to generate knowledge files to create your own custom GPT from a URL
https://www.builder.io/blog/custom-gpt
ISC License
18.16k stars 1.88k forks source link

Expose the service as a REST API #43

Closed databill86 closed 6 months ago

databill86 commented 7 months ago

As a follow up on this pull request #38

I was wondering if it's possible to expose the service as an API. It would be a lot easier and simpler to run it locally, without the need to publish the gpt crawler. It would be perfect if it's containerized! I'm no expert in js, I tried to implement an express js server with the help of chatgpt, but I had a lot of exceptions and errors, so I gave up ^^

This is my attempt:

// file: app/src/api.ts

import express from 'express';
import cors from 'cors';
import fileUpload from 'express-fileupload';
import { PlaywrightCrawler } from 'crawlee';
import { Page } from 'playwright';
import { readFile, writeFile } from 'fs/promises';
import {startCrawling} from "./main";

// Create a new express application instance
const app = express();
const port = 3000; // You may want to make the port configurable

// Enable JSON and file upload functionality
app.use(cors());
app.use(express.json());
app.use(fileUpload());

// Define a POST route to accept config and run the crawler
app.post('/crawl', async (req, res) => {
    // Verify that we have the configuration in the request
    if (!req.files || !req.files.config) {
        return res.status(400).json({ message: 'Config file is required.' });
    }

    // Read the configuration file sent as form-data
    const configContent = req.files.config.data.toString('utf-8');
    const config = JSON.parse(configContent);

    // Placeholder for handling crawler events and operations
    try {
        await startCrawling(config);

        // Read the output file after crawling and send it in the response
        const outputFileContent = await readFile(config.outputFileName, 'utf-8');
        res.contentType('application/json');
        return res.send(outputFileContent);
    } catch (error) {
        res.status(500).json({ message: 'Error occurred during crawling', error });
    }
});

// Start the Express server
app.listen(port, () => {
    console.log(`API server listening at http://localhost:${port}`);
});

export default app;
marcelovicentegc commented 7 months ago

Hey, @databill86 ! That's a great idea. By the way, this could even evolve into a website in the future, enabling people with less expertise to use this kind of service and even turn this into an actual priced product. #38 makes the implementation of such API easier indeed, as it already abstracts the core methods that can be extended to different clients (API, CLI, etc.)

databill86 commented 7 months ago

Thank you! That's really cool! I can't wait to see this feature

adityak74 commented 7 months ago

I can work on this one.

databill86 commented 7 months ago

Hello @adityak74,

I appreciate your interest on this feature. I wanted to check in on the status of #52 and if you have an idea about when you anticipate it being completed.

Thank you.

adityak74 commented 7 months ago

Hi @databill86 I am going to finish this up this week, hope to complete by Sunday.

github-actions[bot] commented 6 months ago

:tada: This issue has been resolved in version 1.2.0 :tada:

The release is available on:

Your semantic-release bot :package::rocket: