Significant-Gravitas / AutoGPT

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
https://agpt.co
MIT License
166.55k stars 44.08k forks source link

Data_Ingestion.py Still used? #4024

Closed remriel closed 1 year ago

remriel commented 1 year ago

⚠️ Search for existing issues first ⚠️

Which Operating System are you using?

Windows

Which version of Auto-GPT are you using?

Latest Release

GPT-3 or GPT-4?

GPT-3.5

Steps to reproduce 🕹

The documentation states this command will pre-seed everything in the workspace, but I find it is ingesting the entire autogpt folder, not just the workspace.

python data_ingestion.py --dir . --init

Current behavior 😯

The documentation states this command will pre-seed everything in the workspace, but I find it is ingesting the entire autogpt folder, not just the workspace.

python data_ingestion.py --dir . --init

Expected behavior 🤔

Pre-seed only workspace folder

Your prompt 📝

# Paste your prompt here

Your Logs 📒

<insert your logs here>
khongminhtn commented 1 year ago

Hello, this is because your --dir argument is "." which indicates that it should ingest the root of the folder. If you want to ingest the workspace, you would need to specify the correct path to the workspace.

The correct command line is: python data_ingestion.py --dir ./autogpt/auto_gpt_workspace --init

Test it yourself and if satisfied, please close this issue.

remriel commented 1 year ago

Right, so the documentation is out of date. It says

"The DIR path is relative to the auto_gpt_workspace directory, so python data_ingestion.py --dir . --init will ingest everything in auto_gpt_workspace directory"

suparious commented 1 year ago

Specifically

https://github.com/Significant-Gravitas/Auto-GPT/blob/master/docs/configuration/memory.md

In the example above, the script initializes the memory, ingests all files within the Auto-Gpt/autogpt/auto_gpt_workspace/DataFolder directory into memory with an overlap between chunks of 100 and a maximum length of each chunk of 2000.

Note that you can also use the --file argument to ingest a single file into memory and that data_ingestion.py will only ingest files within the /auto_gpt_workspace directory.

The DIR path is relative to the auto_gpt_workspace directory, so python data_ingestion.py --dir . --init will ingest everything in auto_gpt_workspace directory.

I was happy to learn they docs were incorrect, as I use this script to ingest from various parts of my system.

github-actions[bot] commented 1 year ago

This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.

github-actions[bot] commented 1 year ago

This issue was closed automatically because it has been stale for 10 days with no activity.