crate-workbench / cratedb-toolkit

CrateDB Toolkit.
https://cratedb-toolkit.readthedocs.io/
GNU Affero General Public License v3.0
5 stars 3 forks source link

[wtf] Improve query library #169

Open amotl opened 3 weeks ago

amotl commented 3 weeks ago

About

The CrateDB SQL query collection on behalf of library.py, added with GH-88, needs further improvements. It has been assembled from a wave of quick & dirty operations, collected from different sources, without much review.

Details

While working on the code base, @seut discovered a few specific shortcomings in this area. Thank you. Maybe @WalBeh, @hlcianfagna, @hammerhead, or others have something to contribute to answer those questions.

Settings

Why only this small subset of settings? It's also not really dedicated to a concrete topic as both, rebalance and recovery settings are queried. https://github.com/crate-workbench/cratedb-toolkit/blob/7b4e30525025fa2d310dd8ca8418442d3d5a4977/cratedb_toolkit/wtf/library.py#L369-L383

Shards

This looks unreasonable complicated to just translate the primary boolean into a string. https://github.com/crate-workbench/cratedb-toolkit/blob/7b4e30525025fa2d310dd8ca8418442d3d5a4977/cratedb_toolkit/wtf/library.py#L422-L439

Why selecting the 2nd decision? This looks problematic e.g. when only 1 shard exists there isn't a 2nd decision. https://github.com/crate-workbench/cratedb-toolkit/blob/7b4e30525025fa2d310dd8ca8418442d3d5a4977/cratedb_toolkit/wtf/library.py#L534-L543

Isn't the query above more detailed? I think this one can be skipped... https://github.com/crate-workbench/cratedb-toolkit/blob/7b4e30525025fa2d310dd8ca8418442d3d5a4977/cratedb_toolkit/wtf/library.py#L591-L601

Thoughts

In general, I am happy to remove any item which should be skipped, and improve all others which have shortcomings, into a DWIM shape, based on your suggestions. Thanks already, and thanks in advance!

hlcianfagna commented 3 weeks ago

Not sure how you are using/planning to use the data later on, but for troubleshooting I think it may be best to err on the side of collecting more data rather than less as we do not know initially where issues may be, so regarding cluster settings for instance I would definitely support collecting all of them, that is the entire value in the settings column.

amotl commented 3 weeks ago

I would also like to add a feature which presents a diff of the current settings against default settings. Do you think this will be possible in any sensible way?

hlcianfagna commented 3 weeks ago

diff of the current settings against default settings. Do you think this will be possible in any sensible way?

Maybe looking at pg_settings ? it has the default values in the reset_val column and the current value in the setting column

amotl commented 3 weeks ago

Excellent. Thanks for the pointers.