BuilderIO / gpt-crawler

Crawl a site to generate knowledge files to create your own custom GPT from a URL
https://www.builder.io/blog/custom-gpt
ISC License
18.15k stars 1.88k forks source link

Json too large for GPT #113

Open nexuslux opened 6 months ago

nexuslux commented 6 months ago

Hi All!

I realize this should largely be about the actual 'crawling' of the sites - but given this was such a breeze with this tool I now find myself with the issue that the text that has been crawled far exceed the limits of what chatgpt can handle.

Does anyone have any recommendation on how to split the json files so as to evenly reach the limits as set by ChatGPT? I've tried both in GPT and in Assistants. In both cases, my json includes too much text

leicheng42 commented 6 months ago

They said to use maxFileSize or maxTokens parameter to control the size of the json file. https://github.com/BuilderIO/gpt-crawler?tab=readme-ov-file#create-a-custom-gpt

nexuslux commented 6 months ago

Totally missed that part when doing the setup!

"if you get an error about the file being too large, you can try to split it into multiple files and upload them separately using the option maxFileSize in the config.ts file or also use tokenization to reduce the size of the file with the option maxTokens in the config.ts file"

Thanks for mentioning! I'll have another look on that.

ctrlbrk42 commented 6 months ago

What is the current limit of ChatGPT (size and number of files).

I need to do this on about 5,000 files of C# code.

leicheng42 commented 6 months ago

What is the current limit of ChatGPT (size and number of files).

I need to do this on about 5,000 files of C# code.

Doesn't seem feasible!

How many files can I upload at once per GPT?

Up to 20 files per GPT for the lifetime of that GPT. Keep in mind there are file size restrictions and usage caps per user/org.

What are those file upload size restrictions?

From: https://help.openai.com/en/articles/8555545-file-uploads-faq

antorio commented 5 months ago

hmm.. since limitation doesn't apply to spreadsheets. Can we just convert json to excel? would it be the same?

julian-passebecq commented 4 months ago

so what are your thoughts ?

nexuslux commented 4 months ago

Use RAG and langchain instead of ChatGPT or assistants.