allyourbot / hostedgpt

An open version of ChatGPT you can host anywhere or run locally.
MIT License
192 stars 75 forks source link

Add DB encryption for email + OpenAI and Anthropic API keys #274

Closed stephan-buckmaster closed 2 weeks ago

stephan-buckmaster commented 1 month ago

It seems to me the OpenAI and Anthropic API keys should not be stored in plaintext in the database.

This change got a bit bigger than expected. It uses ActiveRecord's application level encryption, and includes a migration that will encrypt existing keys.

Still to address:

krschacht commented 1 month ago

Thanks so much! I didn’t get a chance to look through this closely yet, but this sounds like a good change to make. Thanks for jumping on it. I also have a lot of Issues tee’d up, organized by version number, and I’ve made a few passes to flag “good first issue”. Just wanted to highlight that. But I appreciate you discovering this and taking it on too! I’ll dig in tomorrow.

krschacht commented 1 month ago

@stephan-buckmaster I’m ramping up on this PR and trying to understand the new setup instructions. There are a couple things I don’t follow yet. And I’ll add the caveat that I’ve never used activerecord database encryption before but I follow the basics of it.

  1. Why do we need a database.yml.sample. Why not just make any changes that need to be made directly into the app’s database.yml and keep it in the repo? I’m a little hesitant to add it to gitignore.

  2. I think we shouldn’t add db username & password into this and instead we just focus on encrypted columns. Do you think this gets us much else? I think that anyone running the app locally already has to handle any unauthorized access to their machine, which is a bigger issue then us encouraging them to change the auth of their database. And for people running it on Render (or Heroku or most other hosts) the DB will get a password in that environment.

  3. I just read up on credentials:edit and it makes sense, but it sounds like we need to instruct people to run db:encryption:init first in order to have it generate custom keys and then the user runs credentials:edit and copies those values in? I haven’t tried this yet so that may not be quite right, but I just skimmed the rails guide. My instinct is that we don’t add a dummy credentials.yml file to the repo and instead just explain those two steps in the README but I could be missing something.

The one thing I haven’t fully made sense of is that Render is getting a RAILS_MASTER_KEY set in the environment, so I can’t fully remember how that gets set up. We need to be sure that this new setup won’t break that.

krschacht commented 1 month ago

This merge conflict should be really easy to resolve, just a single like in the schema

stephan-buckmaster commented 1 month ago

@krschacht About database.yml -- It looks like a topic that is not quite settled, which I wasn't aware of. I feel all settings could be put in the ENV, essentially making this file redundant. What if I run my postgres server on a different port, or use a socket? If the file is in git, then I have to fiddle around before pulling updates? Let me know, I can revert to the original way ...

Edit: Or do you mean how the samples (database,yml.sample+credentials.yml.sample) are set up? They are samples, so can create the files as preferred

krschacht commented 4 weeks ago

@stephan-buckmaster My question was about adding the database.yml.sample to the repo. It’s easier if you propose changes to the existing database.yml file and then we can have a discussion about them. With a new file it’s harder to compare changes. I don’t actually think we should add a .sample file to the repo, we should just make decisions and update the existing .yml files.

On the question of whether to have a database.yml file or not, let’s keep it since it’s the rails default. But notably the url, port, username, and password can all be defined within the DATABASE_URL. But there are still configuration options which aren’t supported by DATABASE_URL so it’s worth having database.yml for that. But because of DATABASE_URL you shouldn’t need to make changes to the database.yml file in your dev environment.

When you get a chance, let me know about the credentials stuff (my number 3 above).

stephan-buckmaster commented 4 weeks ago

Made changes as requested. I hope the README changes addresses your Q3 (you are right).

How to inform existing users of the required step to execute rails credentials:edit when they pull in these changes? Start a CHANGELOG?

krschacht commented 3 weeks ago

I’ve pulled this branch locally and been trying it out but I’m confused as to how this will work on Render.

it looks like our render.yml file has a generateKey true which generates the secret_key_base automatically. I checked my render deploy and I see this in my ENV but it sounds like this is only used to encrypt session keys.

But now that we are adding a master key I’m unsure how this works in production. We don’t want to add the master.key to the repo so does Render support auto generating this just like secret key base? But if so, it would clearly be different than what we generate locally when we create our credentials file.

So I’m confused as to how we are going to make this work in production/Render. Do you know? It not can you do some research and propose a plan? I ideally want to keep the one-click Render deploy. If we are not able to make this work with the render.yml file then I’d want to make encryption optional so people can still deploy in one click and then optionally follow a guide to enable encryption in production.

thanks for your help. I don’t want to be a blocker on you making progress on this so I’m trying to share what I understand so far. But I also want to make sure we merge in something that is elegant and doesn’t reduce the ease of deploying this app.

stephan-buckmaster commented 3 weeks ago

I have never used Render. I will have to try it out ...

stephan-buckmaster commented 3 weeks ago

@krschacht No luck with Render so far. When I push the button placed in the README I get to fill out a payment form. I didn't want to go that way. So I followed the "Create Static Site" route that is mentioned in the Render doc (https://docs.render.com/deploy-rails). The render.yaml file is recognized. I enter public under Publish Direcotry, and bin/render-build.sh under Build Command The build fails as it cannot connect to a postgres database.

Is adding a credit card for payment really needed?

olimart commented 3 weeks ago

I also had to enter a credit card because no free plan for Redis. (or at least the free plan isn't selected with the deployment button) Render only provides trial for PG but not sure it impacts signing up with a cc.

stephan-buckmaster commented 3 weeks ago

I don't really have a use for the Render service, so I'm reluctant to make payments in order to contribute to the development of this project. The Render documentation does mention support for secret files, at https://docs.render.com/configure-environment-variables#secret-files So it can probably be done. It does seem difficult to communicate with existing users

krschacht commented 3 weeks ago

@stephan-buckmaster You shouldn't have to pay any money. The reason I picked Render as the first host to support is because you can host on there for free. I just corrected the mistake in the render.yml file that was not specifying the free redis plan. I corrected that now in main and I'm just about to test it, but you should be able to update your fork, delete whatever you did in render, and try again. It'll deploy with all free resources.

It may be the case that render still asks you to enter a credit card but it should not bill you because every service in the render.yml file is now pointing to a free version of that service.

But all that said, if you don't want to tackle this aspect of encrypted DB then that's okay. We can close out this PR and wait until someone is ready to work through all the implications of encrypting those columns. There are lots of other ways that I'd still love help improving HostedGPT: allyourbot/hostedgpt/milestone/6

In my mind, the app is getting really good but we're still not even at a solid v1.0 yet. It's coming along!

stephan-buckmaster commented 3 weeks ago

Ok I got your hotfix in my repo. When I "push the button," no payment form appears, so that worked.

An issue is highlighted:

A render.yaml file was found, but there was an issue. services[0].type changing service type not supported

(I imagine this is the same in this repo, not tested)

krschacht commented 3 weeks ago

I think that's because you didn't delete all the existing render services. Try that. Delete all 3 (or is it 4) services that show up on your render Dashboard and also on blueprints and Env, don't miss those. Then click to create a new app from scratch.

Render does not allow changes to existing services through the render.yml file.

On Wed, Apr 24, 2024 at 10:30 AM Stephan Wehner @.***> wrote:

Ok I got your hotfix in my repo. When I "push the button," no payment form appears, so that worked.

An issue is highlighted:

A render.yaml file was found, but there was an issue. services[0].type changing service type not supported

(I imagine this is the same in this repo, not tested)

— Reply to this email directly, view it on GitHub https://github.com/allyourbot/hostedgpt/pull/274#issuecomment-2075221792, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAIR5M6IMPMJSBQLXZVWB3Y67FX3AVCNFSM6AAAAABGJNAHLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZVGIZDCNZZGI . You are receiving this because you were mentioned.Message ID: @.***>

stephan-buckmaster commented 3 weeks ago

Yes, that worked, thanks @krschacht . The app deployed on Render from my repo. I managed to sign up. When I entered an open-ai key, it ran into an error "ActiveRecord::Encryption::Errors::Configuration (Missing Active Record encryption credential: active_record_encryption.primary_key" So I will try to figure that out.

stephan-buckmaster commented 3 weeks ago

Ok, some progress! The commits of today make it quite transparent. Render is configured to generate encryption keys and those apps will use those, while regular users can use config/credentials.yml.enc, as preferred. I trust the Render service will "generateKeys" only once, and use the original in all future deployments.

krschacht commented 3 weeks ago

@stephan-buckmaster Looking good. Few more thoughts:

(1) Yes, I trust that Render will generateKeys only once. If that assumption was false then we'd be having all sorts of trouble with the one place we're currently using it (secret_key_base). The bigger thing which I wasn't sure of: does generateKeys generate the right kind of random string? I checked my Render account and my SECRET_KEY_BASE is 44 characters long. When I look at the rails guide documentation the example keys are 32 characters: https://guides.rubyonrails.org/active_record_encryption.html#setup

But I suspect it'll work, maybe it just looks at the first 32 characters of the random string. We can find out once we do an actual deploy.

(2) It's a little annoying that running db:encryption:init does not immediately copy those values to credentials. It's not a huge deal but having to manually copy those will be a source of setup errors that people make. What happens if someone fails to do this locally? They'll launch the rails app with various attributes being declared as encrypts but without encryption being properly configured. Will the app still run and simply not encrypt those values, or will it fail with some exception? I'm guessing it will still run because we've set support_unencrypted_data to true. If it does still run, let's assume that some people will miss this step in the README and start up the app. Can you add something to the book-up sequence of this rails app where it outputs "WARNING: Attributes not encrypted please do ...." or something like that? What I mean is: someone will type "bin/dev" to startup the server, and if it just works then let's make sure they at least see a repeat of the instructions every time they start the up the rails app.

(3) Can you go ahead and encrypt user.email in this PR as well? I notice there are some special instructions for downcasing email columns, which we should do for email since we want that to be case insensitive. I feel like this is also a slightly "sensitive" piece of information so we might as well get it done at the same time we're handling the others.

(Check out the lint error in the failed "check" and also pull in the latest main and the failed test error should go away.)

stephan-buckmaster commented 3 weeks ago

Good point about the secret keys. The value of generateValue is 256-bits base64-encoded, https://docs.render.com/blueprint-spec . So I should convert it.

About your point 2 -- I could abort in the initializer that I added, with an informative message, if there is no configured keys. How does that sound.

Encrypting email sounds good, and straightforward. It may not even need an update to the migration.

krschacht commented 3 weeks ago

On point 2, I don’t think you need to abort if things actually work. It’s kind of nice that the encryption is optional — if you forget to enable it you’ll get a warning until you do, but you aren’t blocked from using the app. Although, if someone uses the app for a bit and then enables encryption, will their old records get encrypted? I don’t think they will but I think things will still work. Basically, some records will be encrypted and some won’t but activerecord will just be smart because of that flag you’ve set?

On Wed, Apr 24, 2024 at 8:32 PM Stephan Wehner @.***> wrote:

Good point about the secret keys. The value of generateValue is 256-bits base64-encoded, https://docs.render.com/blueprint-spec . So I should convert it.

About your point 2 -- I could abort in the initializer that I added, with an informative message, if there is no configured keys. How does that sound.

Encrypting email sounds good, and straightforward. It may not even need an update to the migration.

— Reply to this email directly, view it on GitHub https://github.com/allyourbot/hostedgpt/pull/274#issuecomment-2076151780, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAIR5M5TUCRHBZDBHWMH4TY7BMKPAVCNFSM6AAAAABGJNAHLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZWGE2TCNZYGA . You are receiving this because you were mentioned.Message ID: @.***>

stephan-buckmaster commented 3 weeks ago

If you want logic to have people use the app without encryption, and then turn it on later, at that point, the existing data would need to be encrypted. With the change of this pull-request, the migration will take care of that. If you want to post-pone that until encryption is enabeld, using the database migration would not be appropriate.

On the other hand, if you want to automatically create the new entries in credentials.yml.enc, I can't find any mention of such a thing. It looks like that is not how the Rails developers have designed this (likely for good reasons).

I feel supporting both encrypted and non-encrypted values for these columns is stretching things, both for "authors" and "readers." So I'll think about that for a bit.

stephan-buckmaster commented 3 weeks ago

I realized the email is used when logging in. But querying for the record to validate the password against by the plaintext of the email address wouldn't work, when the encrytped value is stored.

One would store a hash together with the email, and query by that hash.

Can you confirm to go this extra mile @krschacht?

krschacht commented 3 weeks ago

@stephan-buckmaster i don’t quite understand the problem. Are you saying that once a record is encrypted, you can no longer do a find_by() on that record?

stephan-buckmaster commented 3 weeks ago

Yes, sorry, Active Record's encryption support does provide for find_by to work even when the values are encrypted, and the query parameter is not. I should have that ready tomorrow.

stephan-buckmaster commented 3 weeks ago

New commits today. So yes, it becomes transparent. When logging in, the database will execute a search against the encrypted email to find the persons record.

In terms of supporting both, an app-wide configuration option for unencrypted vs. encrypted storage: In a way, someone who wants encryption can merge in these changes. They will likely be good for a while, since there isn't that much to it. So maybe just skip this pull-request, or postpone.

About the encryption keys on Render, the values taken from ACTIVE_RECORD_ENCRYPTION_PRIMARY_KEY etc... still have to check the question of length (or Base64 encoding vs. 32 alphanumeric bytes).

krschacht commented 2 weeks ago

@stephan-buckmaster Thanks for taking on the email encryption. I'm eager to get this merged into main as it's always a pain to have long lived branched. I did a full round of thorough testing to help feel confident it's ready to merge. In the process, I streamlined things a bit more. You'll see I pushed a few commits. Things are not working really well in dev. Nice job getting this all setup.

The notable change I made was to make encryption setup happen automatically. Anyone currently using the app can simply pull it down, kill their app, and re-run bin/dev or docker compose up --build and encryption will initialize (if needed) and start up again. I hooked into db:prepare and scripted the credentials stuff. I tried to test it pretty thoroughly both as a brand new setup and an existing user pulling the latest and I believe it's handling things well in all cases.

I also made a change to CI to ensure the DB encryption gets setup properly for the test environment.

I believe all that remains now is confirming this works well on Render.

krschacht commented 2 weeks ago

Oh, shoot, actually it looks like my small change to rubyonrails.yml did not fix the CI test script. I don't know if you can click details to see, but it's reporting an exception:

ActiveSupport::EncryptedFile::InvalidKeyLengthError: Encryption key must be exactly 32 characters.

I don't have time to investigate this right now, but we have two open issues, and maybe they're even related? :) I don't know why the encryption key would be the wrong length in CI.

stephan-buckmaster commented 2 weeks ago

Things are fine on Render where I have an instance running. (https://hostedgpt-bra7.onrender.com/) @krschacht About the test failures, invoking this task fails for me the same way: rake db:prepare RAILS_ENV=test It wants to write the config file encrypted, Rails.application.credentials.write(config.to_yaml), but there is no RAILS_MASTER_KEY in the environment, nor a file config/master.keyApparently the app hasn't needed to do such things before. So the encryption key is actually nil here.

krschacht commented 2 weeks ago

Oh interesting, i'll take a look at that tomorrow. That sounds like something I broke.

Glad to hear the auto generated values for Render are working! It looks like we're at the home stretch with this one.

stephan-buckmaster commented 2 weeks ago

It may be a possibility to configure env variables related to encryption, for testing in the github system. I don't usually integrate with github, so I'm not sure how this is usually done. For my own invocation of "rake test" there are no issues/challenges.

stephan-buckmaster commented 2 weeks ago

@krschacht : I just added a comment for that commit about your change to the test:prepare tasks

krschacht commented 2 weeks ago

Okay I fixed up the encryption key generation stuff and incorporated your feedback. Thanks for that. I also fixed a random docker test issue that I ran into.

I think this thing is good to go, although it's hard to be 100% sure this won't break someone's existing setup.

I'm going to merge into main now and update my existing render app & do a fresh deploy of a new render app. Hopefully both of those work smoothly. 🤞

krschacht commented 2 weeks ago

@stephan-buckmaster I had a couple small issues. First, a new app failed to deploy to Render because it was attempting to write the master.key file. I added an extra check to never do that in production. And then my existing Render deploy would not upgrade. It turns out existing instances no longer update the render.yml file once you've made a single change to your app which diverges you from it (and I upgraded my render DB). This means some people (like me) won't get the new ENV keys. I did a couple hotfixes to main which detect this situation and raise an exception that explains how you can fix things manually. It's not quite as clean as I like, but we don't have a ton of app users yet and this should be an edge case.

Anyway, thanks for sticking with this PR and helping me to get it landed. It's a good addition, and I am glad I've finally learned how activerecord encryption works!

If you're up for taking on another task, check out https://github.com/allyourbot/hostedgpt/milestone/6

stephan-buckmaster commented 2 weeks ago

Fantastic! You're right, the sooner the better. I'll be around if any users run into trouble. Btw, it would be good to cover this feature in documenting backup requirements. I'll be looking at your list of outstanding tasks.

krschacht commented 2 weeks ago

Say more about documenting backup requirements. I have not considered that. Are there some special things to keep in mind? Oh, maybe you just mean that the keys themselves need to be backed up too, not just the database. Is it that?

stephan-buckmaster commented 2 weeks ago

Yes, so database, plus encryption secrets, plus potentially the master.key file.

How does one backup data stored on Render? How to restore it?