Parse states from input logs

GoogleCodeExporter commented 9 years ago

Parse state properties from input logs, to mine structural (Daikon) invariants 
later on. TraceParser must capture state properties from log lines and save 
those properties in EventNode's. EventNode must bundle an event and a state 
prior to that event. Some EventNode's might have null states because log traces 
might not alternate between states and events.

Original issue reported on code.google.com by ssukkerd on 14 Jan 2013 at 8:03

GoogleCodeExporter commented 9 years ago

This description needs further work:

1. Don't mention Daikon invariants since this is outside of the scope of this 
issue. Or, alternatively, create an issue for Daikon invariants and reference 
it from this issue.

2. What are state properties? Give an example, or define this somehow.

3. Give/propose syntax for how state is specified/extracted via regular 
expressions. This changes how users interact with the application, so this 
needs to be well defined.

Overall it would help to have a log example that makes the proposal more 
concrete.

Original comment by bestchai on 14 Jan 2013 at 7:11

GoogleCodeExporter commented 9 years ago

Parse state properties from input logs. A state property is a set of 
identifier-value pairs. A state property is defined at a point in time (i.e., 
before or after a particular event in an execution trace). An example of log 
that contains state properties is:
a=0,b=1
eventA
a=1,b=2,connected=false
eventB
connected=true.

TraceParser must capture state properties from log lines and save those 
properties in EventNode's. EventNode must bundle an event and a state property 
prior to that event. In the example above, the EventNode's would be (eventA, 
{a=0,b=1}), (eventB, {a=1,b=2,connected=false}) and (null, {connected=true}). 
Note that some EventNode's might have null state property because log traces 
might not alternate between state properties and events.

Syntax of a state property can be anything that users specify, as long as it 
captures a set of key-value pairs. (I'm not sure I get your point for #3.)

Original comment by ssukkerd on 15 Jan 2013 at 1:31

GoogleCodeExporter commented 9 years ago

For (3), you'll be expecting users to input a regular expression for the -r 
option that tells Synoptic how to parse state from the log file. For example, 
users use the TYPE capture group to parse event types from the log. What's the 
-r expression syntax/constraints for parsing state?

Original comment by bestchai on 15 Jan 2013 at 7:13

GoogleCodeExporter commented 9 years ago

Since a state property can consist of an arbitrary number of key-value pairs, 
it is not possible, in Java, to use regular expression to capture all pairs. 
There is an explanation here 
http://stackoverflow.com/questions/5018487/regular-expression-with-variable-numb
er-of-groups.

Instead, I think regular expression should be just for 1 pair of key-value, and 
we iteratively match each pair of the state property. For example, a log line 
indicating a state property looks like this: x=1,y=2. The regular expression 
for a key-value pair is (?<ID>)=(?<VALUE>),?.

This type of regex does not match an entire log line. Therefore, addRegex() and 
parseLine() need to change. Log lines indicating state properties are the 
special case.

Original comment by ssukkerd on 16 Jan 2013 at 6:59

GoogleCodeExporter commented 9 years ago

This seems more complicated than necessary. Particularly because addRegex() and 
parseLine() are already complex, so implementing this change will be difficult.

A simpler solution would be to parse a "state" string from a log line and later 
assume that this string has a particular format. For example, you might parse 
"x=1,y=2" as the state string, and later when you need to interpret the string 
you can assume an (?<ID>)=(?<VALUE>),? kind of format.

Overall, my opinion is that we don't have to solve this perfectly at this time 
-- it's just the first step in a long series of steps in this project. If you 
spend two weeks on this step then you will have lost a lot of time.

Original comment by bestchai on 16 Jan 2013 at 6:06

GoogleCodeExporter commented 9 years ago

Solution in revision 96aa87414072, please review. I have made a few assumptions 
about the log. You should see if you agree with them.

Original comment by ssukkerd on 18 Jan 2013 at 6:53

GoogleCodeExporter commented 9 years ago

I left some comments in the code and have four high level comments:

1. I think the two assumptions that you list in the code are okay. But, you 
need to validate these and throw an exception when they do not hold (just as 
you do when TYPE and STATE co-appear). I'm certain that we will make mistakes 
and invalidate these every once in a while, so it would be nice to include 
checks. You can create a separate issue for this or solve it in this issue.

2. I think the assumptions needs to be documented not just in the code, but 
also on the wiki perhaps. This is probably a broader issue, since you also need 
to document the regular expression assumptions/constraints and give a few usage 
examples.

3. You will eventually want a separate data type to represent state. The 
earlier you introduce it, the less code you'll have to rewrite later on. That 
is, instead of maintaining a String type for preEventStateProperty and 
postEventStateProperty, create and use a separate "State" type, which can 
contain a "rawStringState" internally to represent the parsed state. This 
probably needs a separate issue, as well.

4. Using EventNode to keep state is a hack and not the best solution. Add a 
TODO and/or create an issue to refactor the parseLine and parseTrace methods to 
eliminate this hack (i.e., ideally parseLine would return either an EventNode 
or a State type described in (2) above).

Original comment by bestchai on 18 Jan 2013 at 8:00

GoogleCodeExporter commented 9 years ago

Original comment by ssukkerd on 22 Jan 2013 at 7:51

Added labels: Priority-High
Removed labels: Priority-Medium

GoogleCodeExporter commented 9 years ago

I addressed all your comments in Revision 1f240b2ed125 except for creating a 
separate data type to represent state. I create a new issue for this task.

Original comment by ssukkerd on 22 Jan 2013 at 7:55

GoogleCodeExporter commented 9 years ago

Looks good. Your changes raise a question for me -- does the prior assumption, 
that "if a trace contains a state property, it must also contain at least 1 
event" still hold true for the current code? It doesn't look that way to me. 
That is, I don't see an exception of this kind, and you added a test called 
parseTraceWithSingleState. So, did you decide to remove this assumption?

Original comment by bestchai on 22 Jan 2013 at 6:28

GoogleCodeExporter commented 9 years ago

Original comment by bestchai on 22 Jan 2013 at 6:36

Added labels: StatefulSyn

GoogleCodeExporter commented 9 years ago

Issue 261 has been merged into this issue.

Original comment by bestchai on 22 Jan 2013 at 6:40

GoogleCodeExporter commented 9 years ago

Yes, I removed the assumption, because it seems like an edge case that should 
be handled.

Original comment by ssukkerd on 22 Jan 2013 at 6:53

GoogleCodeExporter commented 9 years ago

Original comment by ssukkerd on 22 Jan 2013 at 7:07

Changed title: Parse states from input logs

GoogleCodeExporter commented 9 years ago

But, the result is that parseTrace() does not merge the singleton state with 
any events, and therefore returns an EventNode instance with a null event. Your 
unit test passes, but I expect that an integration test that attempts to run 
the complete Synoptic pipeline on a log with a single state would throw a null 
pointer exception.

I think you need to either bring back the assumption and throw a parse 
exception, or return an EventNode that is usable by other code (which expects a 
non-null event as part of an EventNode). I think the first solution is easier 
-- I can't think of a synthetic Event instance that would make sense for a log 
that doesn't have any events..

Original comment by bestchai on 22 Jan 2013 at 7:22

GoogleCodeExporter commented 9 years ago

I've reintroduced the assumption in Revision 7a509a546a2f. Please review and 
merge this branch to default.

Original comment by ssukkerd on 23 Jan 2013 at 1:15

GoogleCodeExporter commented 9 years ago

Comments in the review.

Original comment by bestchai on 23 Jan 2013 at 6:23

GoogleCodeExporter commented 9 years ago

Solution in Revision 9db1fcce3b5e, please review and merge this branch to 
default.

Original comment by ssukkerd on 23 Jan 2013 at 7:06

GoogleCodeExporter commented 9 years ago

Merged into default with revision ab2825466325

Original comment by bestchai on 24 Jan 2013 at 8:22

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

Merged into default with revision ab2825466325

Original comment by bestchai on 24 Jan 2013 at 8:22

ModelInference / synoptic

Parse states from input logs #279