Open maxachis opened 2 months ago
Hmm. Who has authority to write and run github actions? That limits the risk somewhat but could still go wrong.
Hmm. Who has authority to write and run github actions? That limits the risk somewhat but could still go wrong.
In this case, the issue is not with the Github Actions themselves, but with the tests which are run by the Github actions. A person could create a test called print the unencrypted password of every user in the database
, and Github Actions would run that test. Even if we limited who can write and run Github actions, as long as the Github action can run something whose level of access we aren't restricting, the hole can be exploited.
OK, I think we should 4. encrypt it! We can use this as a reference: https://docs.pdap.io/activities/data-dictionaries/hidden-properties
@josh-chamberlain Can do!
From there, we can expand to additional hidden properties.
There are likely also ways to lock down GitHub Actions, most prominently by restricting who can make a pull request, or by moving some tests off-site. How far we want to go with that is up for debate.
This is probably a good reference to consult for security hardening.
While I took efforts to avoid exposing sensitive data in the Sandbox database, there are ways to expose sensitive data in the stage database.
The hole is through Github Actions.
As designed, the Stage Environment is meant to be interfaced with through Github Actions, which will run tests that interface with the database. If someone wanted to expose sensitive data, they could develop tests which print the contents of rows of the database, run that in Github Actions, and thereby see that data printed in the Github Actions log.
Note that passwords, being encrypted, would still have some security if this were exploited. API keys, user emails, and information on requests, however, would not. And additionally, our logic is such that a person could decrypt the passwords (using functions we already have in our code) and then print those to the log.
There are a few possible ways to address this:
For a quick and dirty solution, 1 would work. If we're worried about verisimilitude, we can make our fake data generation more sophisticated and add more data.
4 is probably good for the long-term, but has a lot of unknowns that would need worked out.