Open marcocitus opened 7 years ago
Don't use the same tables across different tests
That one I actually quite strongly disagree with - we shouldn't duplicate tables all the time. That slows things down, duplicates code, duplicates data loading (copy from file in the worst cases), etc. There's cases where that's necessary (e.g. when screwing with the table definition), but we really don't need 10 different lineitem_hash copies.
Points I'd make additionally:
It's probably worth adding reasons for the guidelines as well.
Don't use the same tables across different tests
That one I actually quite strongly disagree with - we shouldn't duplicate tables all the time. That slows things down, duplicates code, duplicates data loading (copy from file in the worst cases), etc. There's cases where that's necessary (e.g. when screwing with the table definition), but we really don't need 10 different lineitem_hash copies.
Of course we have no need for multiple iterations of setting up big tables like lineitem
or supplier
but I'm not sure Marco was suggesting that. There are lots of small tables which are created just for the purpose of a few checks and the benefits of having their definitions right next to the tests is well-worth the cost of a few extra create_distributed_table
calls.
It's hard to remember all the rules that make for good regression tests. We should add a document to the wiki that succinctly describes the rules to follow.
Some basic ideas:
If applicable, test whether a new feature:
(don't need to have all of these all the time, but just good to remember pitfalls)
Do not:
(unless absolutely necessary, e.g. EXPLAIN always shows shard IDs)