Tracking issue for fuzz tests

What problem does the new feature solve?

Introduce the fuzz tests and related utils

What does the feature do?

The utils are focused on randomly generated sql input and verify our database's output. it should be easily integrated with other integration tests(e.g., After fuzz tests, the region migration should still be available. Or the DB, after recovering from the chaos failures, the fuzz tests should be passed)

[x] #3173
[x] Generator
- [x] DDL @WenyXu
- [ ] Alter data type
- [x] INSERT @zhongzc
- [ ] SELECT @waynexia
- [ ] DELETE #3761
[x] DslTranslator
[ ] DslExecutor
- [x] PG
- [x] DDL @WenyXu
- [x] GreptimeDB
- [x] DDL @WenyXu
- [ ] Log / Report
- [ ] Retry / Timeout

Other engines/scenarios

[x] metric #3741
Implementation challenges

No response

Upon observing GrepTime uncovering code issues through fuzzing, I am intrigued. With some prior experience in fuzzing, I have been contemplating integrating fuzzing tests into Kvrocks. Here are my inquiries:

Fuzzing essentially tackles a vast search problem. Many fuzzers pinpoint code issues by randomly generating inputs that meet specific criteria, such as csmith and sqlsmith. However, not all randomly generated inputs are necessarily meaningful; some may hold more significance and warrant further exploration beforehand. Consequently, certain fuzzing tools steer their input generation process using metrics like changes in code coverage: the notion being that if a newly created input can traverse previously unexecuted code segments, it likely holds greater value. This form of fuzzing is commonly known as coverage-guided mutation-based fuzzing, exemplified by afl++ and libfuzzer. Will GrepTime adopt this approach?
Another focal point of fuzzing involves identifying dependable test oracles. Numerous fuzzer tools employ differential testing: for instance, supplying inputs separately to MySQL and MariaDB then contrasting the outcome disparities between them. Nonetheless, an anticipated challenge lies in variations across different DBMS implementations potentially yielding numerous false positives (even among MySQL and databases asserting compatibility with MySQL). Addressing this issue might necessitate substantial software engineering efforts to alleviate or discover innovative workarounds like sqlancer. What strategy does GrepTime intend to pursue here?
A fuzzer could unearth a multitude of errors; however, distinguishing genuine findings from false positives within extensive fuzzer outputs poses a challenge. Moreover, many errors identified by the fuzzer may stem from the same root cause—manually sifting through tens of thousands of outputs to pinpoint dozens of distinct bugs could prove arduous. At this juncture, we might require additional tools for filtering and simplifying lengthy SQL inputs produced by these utilities. Does GrepTime harbor any distinctive approaches to tackle this predicament?

GreptimeTeam / greptimedb

Tracking issue for fuzz tests #3174

What problem does the new feature solve?

What does the feature do?

Implementation challenges