Removing sensitive data from files before sending them to GPT

dylanintech / flo

63 stars 5 forks source link

Removing sensitive data from files before sending them to GPT #2

Open nikwen opened 1 year ago

nikwen commented 1 year ago

Code often contains sensitive data such as

Secrets
- Password
- API keys
Personal data
- Emails
- Phone numbers

I'd love for these to be removed from files before submitting them to the OpenAI API.

Super interesting project by the way. Keep it up!

dylanintech commented 1 year ago

thanks for opening this issue @nikwen :)

what do you think the simplest user flow would be here? like should there be some flag (e.g. --ignore) that ignores entire files from being checked (ex: .env files or all the files in the .gitignore file)?

another option would be sanitizing the input that the agent absorbs, but this would involve making direct changes to the langchain agent abstractions.

nikwen commented 1 year ago

Sorry, I missed the email for your comment!

My ideas were:

Exclude stuff from .gitignore
Detect what is sensitive info and remove it from what is submitted to the API

The latter is important because in the very early stage of a project, many folks might not yet have a Git repo.

Furthermore, it's not uncommon to paste API keys into code files during development if you just want to test something quickly.

Note regarding .env files: Sometimes, they also have names like .env.development.

marnec-ad commented 1 year ago

I'd suggest removing files found in .gitignore AND .gitattributes (for devs using git-crypt)