anse1 / sqlsmith

A random SQL query generator
GNU General Public License v3.0
754 stars 128 forks source link

[QUESTION] Is there a reason why a specific probability is chosen for a particular decision? #57

Closed PVSekhar1234 closed 5 months ago

PVSekhar1234 commented 5 months ago

For example, in grammar.cc at line 470, why is the type of statement chosen as merge statement with a probability of 1/42 and not any other probability?

Like is there any literature basis behind the decision of choosing a specific probability while decision making or is it trial and error?

If the probabilities are chosen based on trial and error, what things were considered while choosing the probability?

anse1 commented 5 months ago

These numbers are quite ad-hoc and there's no real theory behind them, I just try to pick them to get into the ballpark for "natural" looking statements and nudge them if I didn't like the results. E.g. avoid excessive numbers of target list entries, more read-only statements than writing ones.

IIRC the Csmith project - which inspired SQLsmith - has a much more scientific approach to the probabilities of grammar rules, but it's been a while since I read their papers... That's where I'd look first if I were convinced that the current approach needs to be improved.

PVSekhar1234 commented 5 months ago

Thank you for the quick response.