I have taken the most crude stab at performance testing. Enough to establish, in my mind, that there are no blockers in this area looming sooner than I anticipate.
What I'd like to do is deploy bloom to a beefy instance, and load it up incrementally, pausing in beteen loads to calculate metrics on common bloom operations. This to produce a plot of performance over DB size. While doing this, we should monitor for areas of quick improvement, and note any areas where improvements can be made given the site specific needs (ie: complex database sharding, etc is not going to be a common need, but is an option)
The end result would be the performance data plotted up some some large DB size, a list of performance enhancements, and reccomendations for va handful (2-3) expected installation sizes (which I anticipate will be smaller than bigger). This could include guidance on backup, recovery and archive as well.
My sense is we'd like bloom to be able to handle well 100M records, which would approximate 250000 accessioned samples and 100K or so workflows. This would represent a decade or more of most labs expected volumes.
I have taken the most crude stab at performance testing. Enough to establish, in my mind, that there are no blockers in this area looming sooner than I anticipate.
What I'd like to do is deploy bloom to a beefy instance, and load it up incrementally, pausing in beteen loads to calculate metrics on common bloom operations. This to produce a plot of performance over DB size. While doing this, we should monitor for areas of quick improvement, and note any areas where improvements can be made given the site specific needs (ie: complex database sharding, etc is not going to be a common need, but is an option)
The end result would be the performance data plotted up some some large DB size, a list of performance enhancements, and reccomendations for va handful (2-3) expected installation sizes (which I anticipate will be smaller than bigger). This could include guidance on backup, recovery and archive as well.
My sense is we'd like bloom to be able to handle well 100M records, which would approximate 250000 accessioned samples and 100K or so workflows. This would represent a decade or more of most labs expected volumes.