Privado-Inc / privado

Open Source Static Scanning tool to detect data flows in your code, find data security vulnerabilities & generate accurate Play Store Data Safety Report.
https://docs.privado.ai
GNU Lesser General Public License v3.0
502 stars 57 forks source link

Storing output result privado.json inside repo itself seems problematic #50

Open pandurangpatil opened 2 years ago

pandurangpatil commented 2 years ago

Is your feature request related to a problem? Please describe. The approach to results storage is extremely interesting but also potentially problematic. At present, a repo’s scan result is stored into [repo]/.privado/privado.json, meaning it lands inside the repo. Practically, this means the results will likely be lost when the repo is removed and recloned.

Describe the solution you'd like I would love to see the results persist in some way without having to copy or move them myself. Maybe this would mean storing the results in ~/.privado/results/ for example. This would allow users to view historical results easily and maybe give the Cloud Viewer a “trend” view for repos. It always feels good to see the Risk rating decrease over time… and it’s nice to be able to notice a sudden spike in Risk if that happens.

ojaswa1942 commented 2 years ago

I think we should also consider keeping these results in the repo itself. Following are some pointers:

  1. Once we have an option to upload/load results directly without having to scan - it will make more sense for people to keep these in the repository like a privacy disclosure, which anyone can load up using privado and at some point collaborate.
  2. At the same time, if checked in, it will also be tied up with git and provide some contextual history.

However, the described solution raises some good points. For those:

  1. We can still have a "trend" view for repos for Cloud viewers, including "Risk-meters" (Example: Diff between previous & new results).
  2. We can have the latest two results (privado.old.json - like most configuration updaters). Keeping more than that might not make sense if it is not tied to a git repository. Devs can still consume the output in their own CI/test systems.

If we choose to move the results to ~/.privado/results - we additionally need to create a local database-like mechanism that will maintain a scanIdentifier-repoIdentifier and handle cases like "rename" and "move" locally to maintain that database.