department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
277 stars 194 forks source link

[Data-Mining and Troubleshooting Tool]: Internal Production Read-Only Tooling #49233

Open kylesoskin opened 1 year ago

kylesoskin commented 1 year ago

Describe the problem

As an engineer/developer on the claims and appeals team (or as a person on any auxiliary teams), I am often required to investigate and troubleshoot very specific issues that are only captured or reproducible in production or with certain "production like" data.

Because of this, it is often times required to access production directly, to gather information (submission information, error information, etc).

Due to the nature of some of the information (PII data) it is often not able to be logged (although most issues can be triaged/observed either by looking at existing log messages, or adding log messages that capture the error).

There are also Data-Mining activities that would benefit from this tool. It would be nice to be able to mine useful data out of our stored data without the risk of overloading the actual production system (due to size of the queries), and without risk of impacting or altering actual production in any way.

Who will benefit

Describe your idea

There are many possible different solutions here that vary in size, scope and features.

One idea that I have seen in the past, is to keep a separate (perhaps 24 hours behind) copy of production that is off to the side and usable specifically for querying, poking at, troubleshooting, etc, etc. This would be an exact snapshot of production, that could be cut everyday. If someone blows it up with massive queries, who cares no big deal. This approach also has the benefit of allowing developers to interact with the DB the same way they do when they develop (rails console, ruby scripts, etc). This approach also has the benefit of being able to be a testing ground for things, pre-production that would surface issues possibly not surfaced in staging (due to data differences, or size difference of the db/tables)

Another idea is an authenticated tool, that could have various access controls to allow people to query the DB (and optionally to hide PII depending on access level). A tool like this could provide gated access to certain parts of the database to allow people to troubleshoot by looking at DB records. It could allow call center/VA people to investigate individual submissions with the full context of the vets submission, to help move claims along or fix/clear up errors/gather info. This tool could also be used by researchers or data scientist to gather info and get insights into the data we have.

There could be other solutions as well, this is just 2 examples.

Provide evidence

More people have production access than should, because they need to do tasks and processes that are a part of their job that currently would be impossible (or very difficult) to do if prod access were revoked.

An alternative, safer, more protected place to view production data, without risk of altering it or damaging/impacting any actual prod systems would be a better alternative.

Platform Mission

Other:

No response

pjhill commented 2 months ago

This is a good idea, and one that VA.gov Platform has tentatively explored in the past. In general, VA's attitude toward moving production data outside of production has been negative. The general attitude is that the risks of moving production data around do not outweigh the benefits of a system that permits developers to test and diagnose.

Instead, the organization is currently inclined to develop an automated system that permits timeboxed and limited access to production data IN production. We will update stakeholders with more information in the future as the solution takes shape.

gopixelsgo commented 2 months ago

Hi @pjhill - is there any update on this one, or is it still in Review? Thanks!

tayism commented 3 weeks ago

Hello @pjhill -- checking the status of this ticket to ensure it's in the appropriate column on the board. Thanks!

gopixelsgo commented 6 days ago

Hi @pjhill - checking to see if anything has come of this item. Is it still in Review?