NAICNO / Jobanalyzer

Easy to use resource usage report
MIT License
0 stars 1 forks source link

Database v3 #517

Open lars-t-hansen opened 5 months ago

lars-t-hansen commented 5 months ago

This is the successor to #379. Some things to consider:

lars-t-hansen commented 3 months ago

A completely different take on this is that we should jettison the database component of Jobanalyzer and build a new one around a standard data warehouse engine, TBD. This would give us a lot more resilience and probably (on balance) reduce complexity.

There are issues with this move. Currently the analysis logic is based on stream-of-samples processing. There can be a very large volume of samples in a given time window, and I/O isn't disappearing as a problem just because we move to a database system, quite the contrary. To make use of a database system we'd probably want to preprocess data as they come in, partitioning the data into jobs and nodes, so that they can be more easily accessed for the tasks we need. At the same time, there may be utility in keeping the original data streams (or something like them) since we don't know all the uses for them yet. So we'd be storing more data, but more of it would be in a directly useful form hopefully and the net performance gain would be significant. Some experimentation and discussion would be warranted. For example, combining the current database with an RDBMS for aggregated data (job-centric view for jobs keyed by job, user, etc) might be a sensible solution too.