dblotsky / stringfuzz

Fuzzer and generator for string and regex problems in SMT-LIB 2.x format.
Other
20 stars 9 forks source link

Adding rich transformers to stringfuzzx. #6

Closed dblotsky closed 6 years ago

FedericoAureliano commented 6 years ago

Awesome, thanks!

Do you plan to make the probability thresholds or lengths be command-line parameters?

Yup! Especially if we do chaining through shell pipes.

At a high level, what does the extend transformer do?

I was hoping to have extend add a random subtree at a leaf. Currently, it replaces a leaf with a duplicated subtree from elsewhere in the AST so that we don't have to deal with adding definitions to the AST.

Basically, stub prunes subtrees, extend appends subtrees, rotate transformes the shape by rotating triples, and litfuzz fuzzes literals/operators.

Maybe it's better to combine these so that we don't rely on chaining so much. Also, some of the reasoning is based on types (e.g. replace string expressions with string expressions) so building this into the walker could be useful.

dblotsky commented 6 years ago

@FedericoAureliano thanks for the clarifications. A few follow-up questions:

  1. How would you summarise, in words, the mechanism by which litfuzz fuzzes literals?
  2. I noticed that litfuzz also changes the structure of regexes. Is that expected?
  3. What do you think of combining the extend/prune transformers into one that does both?

Also, I've finally dug them up, and the transformations from Ifaz's work were:

I'll send you the paper via email, once I get a copy from Ifaz.

FedericoAureliano commented 6 years ago

@dblotsky

  1. How would you summarise, in words, the mechanism by which litfuzz fuzzes literals?

Basically when it arrives at a literal it modifies it to a "similar" literal. For strings this means that we iterate through the characters, for each character we either delete the character, keep it, or replace it with a random (short) string based on some probability. For a literal integer k this means replacing k with k+r where r is a random number between -k and k.

  1. I noticed that litfuzz also changes the structure of regexes. Is that expected?

I added in some regex operator modifiers that will, with some probability, change a regex operator to another with the same arity and argument types. So for example a * can become a + and vice versa. This was meant to produce more variety but perhaps doesn't belong here.

  1. What do you think of combining the extend/prune transformers into one that does both?

That's a good idea. I'll do that.

I'll send you the paper via email, once I get a copy from Ifaz.

Thanks! Rotating actually goes beyond just arguments so it should be different, but complementary to, permute. Adding those in will be great.

FedericoAureliano commented 6 years ago

Hey @dblotsky,

What do you think of the updates? I think I addressed all the issues.

We now have three new transformers:

  1. Fuzz - doesn't mess with structure, just swaps interchangeable (same type) operators and replaces literals with "similar" literals
  2. Rotate - does a clockwise rotation on arguments where at least 2 of the arguments belong to the parent
  3. stem-cut - swaps a leaf with a subtree for each type (int, str, bool, regex)
dblotsky commented 6 years ago

@FedericoAureliano thanks for the clarifications and the modifications. I've left some more comments. Sorry for the delay!

FedericoAureliano commented 6 years ago

Hey @dblotsky, I addressed all the comments except for a permanent fix for types. I think they should live inside the nodes themselves but I don't see an immediate clean way to do this. Literals, Identifiers, and Expressions should have a type signature, but having it inside ASTNode, the only class that unites them, seems too high up.

What do you think the best solution is for types?

dblotsky commented 6 years ago

@FedericoAureliano thanks for the changes! Yeah, I think it would be ok to keep the type signature data in ASTNode and its subclasses. But that can be done in the future. The current separate file should be ok for now.

There's one gotcha that I came across: we can't distinguish between the type () -> String/Int/Bool (i.e. a constant or variable) and String/Int/Bool -> (), but I can't immediately think of anything that has type String/Int/Bool -> ().

dblotsky commented 6 years ago

Actually, I can: (assert X). Its type in this type scheme would be BOOL, same as the type of a Boolean identifier.

FedericoAureliano commented 6 years ago

That's a good point. We'll have to be careful with that when we build in types. We should also build in identifier types in the process.

How should we go about doing that, and how high of a priority is it? I think it deserves a separate branch.

dblotsky commented 6 years ago

@FedericoAureliano yep, I agree that it's something for a later time. We could also work around it by just changing a type definition to be more like the real thing: X -> Y. For now, this looks good to me.

Last questions:

  1. How did you go about testing these?
  2. Do you think it would be better to have graft be able to graft more than 1 branch of each type? If not in this PR, in a later one?
FedericoAureliano commented 6 years ago

Hey @dblotsky sorry for the slow response. I added Lfaz's transformers (translate, reverse, multiply).

I've been testing the transformers on the problems generated by stringfuzzg at different depths. stringfuzzg concats and regex have been especially useful for this.

I think graft is OK with one branch for each type because each modification produces a problem that is different enough from the original to test the solvers and we can repeat the process for even more variation.

That being said, we could let the user specify this with command line arguments. Graft and multiply would perhaps benefit the most from this out of all the transformers.

dblotsky commented 6 years ago

Hey @FedericoAureliano, those look pretty great to me! I left a few comments on the new code. Speaking of it: what do you think of splitting them off into a second PR? That way you can close the previous work as "done", and focus on finishing up only the new transformers.

Also, one small thing: it is actually a capital "i" in "Ifaz", not an "L". Would you be ok changing that in the commit message and the PR? You can squash the last two commits and force-push to this branch. GitHub should still keep the PR intact. Alternatively, if you're ok with splitting the new transformers off into a new branch and PR, you can just create new commits for them.

dblotsky commented 6 years ago

Also @FedericoAureliano, I noticed that graft just slices off a node in this example:

stringfuzzg regex -t 2 -m 10 -x 10 | stringfuzzx graft

changing this:

(declare-fun var0 () String)
(assert (str.in.re var0 (re.++ (str.to.re "#hAhc<w'{:") (str.to.re "R5x$!PCZJ-"))))
(check-sat)

to this:

(declare-fun var0 () String)
(assert (str.in.re var0 (str.to.re "#hAhc<w'{:")))
(check-sat)

This seems like it's is because the leaf node gets swapped with its predecessor, and the predecessor becomes a disconnected component of the graph. What do you think is a good way of fixing this?

dblotsky commented 6 years ago

Also @FedericoAureliano, I've found and fixed some scanner bugs which you helped uncover. Turns out I was treating integers as identifiers, and not parsing Boolean literals at all! 😢 They're fixed in 106bcc1, so you can rebase on top of master, or just merge with it.

Sorry for so many messages!

FedericoAureliano commented 6 years ago

Speaking of it: what do you think of splitting them off into a second PR? That way you can close the previous work as "done", and focus on finishing up only the new transformers.

Will do!

Also, one small thing: it is actually a capital "i" in "Ifaz", not an "L".

My mistake! Sorry Ifaz if you see this :P

I noticed that graft just slices off a node in this example

Yup, since the swapping of the expr and literal happens on the same "path" (the expr is an ancestor of the literal) you get this effect.

What do you think is a good way of fixing this?

I had intended this behaviour so that the transformer will still have an effect on single path examples, but am open to changing the behaviour.

They're fixed in 106bcc1, so you can rebase on top of master, or just merge with it.

Awesome! What I'll do is take Ifaz's transformers out of this PR, merge this PR with master, and then make a new branch and PR for Ifaz's transformers. Does that sound good?

FedericoAureliano commented 6 years ago

Thank you @dblotsky for all the code reviews! Let me know if I can help with the paper (proof read or whatever you need).