Open garrettr opened 7 years ago
Here's one approach that would alleviate some of the sluggishness from storing the full scan results every time a site is scanned:
Scan
model, add a column last_scanned
last_scanned
set to the timestamp of the scan.last_scanned
to set it to the timestamp of the most recent scan. This way you are tracking changes instead of recording every scan.
So FYI I implemented this for the SecureDrop landing page scanner so I could scan all the landing pages once an hour, if at some future point the STN database gets too slow or we decide we want to scan more then we could use the same code here and do a migration
Secure the News was very much designed as an "MVP," along with the mantra of "premature optimization is the root of all evil." As a result, querying Scans in the database is exceptionally slow. To see this slowness in action, load the site from the development environment and profile loading either the homepage or the leaderboard (both trigger expensive database operations).
The only reason why this slowness does not affect the production site is our extensive use of front-end caching (thanks, Cloudflare!).
There are some other issues with our database design as well. For example, we re-scan and store the full set of scan data for each site every 8 hours, even though the results of the scans change infrequently. It seems like it would be much more efficient to somehow only store the deltas of each scan, or only store a full set of scan data when the results change.
Finally, since the majority of the
sites
app's data is serialized into a relational database from the JSON output of our scanning toolpshtt
, I've wondered in the past if using a database that is optimized for storing and querying JSON directly might be a better approach. During the initial development phase I wanted to avoid the complexity of using something other than the default Django ORM, so this idea was never explored in depth.Finally: I am an amateur when it comes to database design, so I welcome any and all ideas here!