Closed GoogleCodeExporter closed 9 years ago
This description needs further work:
1. Don't mention Daikon invariants since this is outside of the scope of this
issue. Or, alternatively, create an issue for Daikon invariants and reference
it from this issue.
2. What are state properties? Give an example, or define this somehow.
3. Give/propose syntax for how state is specified/extracted via regular
expressions. This changes how users interact with the application, so this
needs to be well defined.
Overall it would help to have a log example that makes the proposal more
concrete.
Original comment by bestchai
on 14 Jan 2013 at 7:11
Parse state properties from input logs. A state property is a set of
identifier-value pairs. A state property is defined at a point in time (i.e.,
before or after a particular event in an execution trace). An example of log
that contains state properties is:
a=0,b=1
eventA
a=1,b=2,connected=false
eventB
connected=true.
TraceParser must capture state properties from log lines and save those
properties in EventNode's. EventNode must bundle an event and a state property
prior to that event. In the example above, the EventNode's would be (eventA,
{a=0,b=1}), (eventB, {a=1,b=2,connected=false}) and (null, {connected=true}).
Note that some EventNode's might have null state property because log traces
might not alternate between state properties and events.
Syntax of a state property can be anything that users specify, as long as it
captures a set of key-value pairs. (I'm not sure I get your point for #3.)
Original comment by ssukkerd
on 15 Jan 2013 at 1:31
For (3), you'll be expecting users to input a regular expression for the -r
option that tells Synoptic how to parse state from the log file. For example,
users use the TYPE capture group to parse event types from the log. What's the
-r expression syntax/constraints for parsing state?
Original comment by bestchai
on 15 Jan 2013 at 7:13
Since a state property can consist of an arbitrary number of key-value pairs,
it is not possible, in Java, to use regular expression to capture all pairs.
There is an explanation here
http://stackoverflow.com/questions/5018487/regular-expression-with-variable-numb
er-of-groups.
Instead, I think regular expression should be just for 1 pair of key-value, and
we iteratively match each pair of the state property. For example, a log line
indicating a state property looks like this: x=1,y=2. The regular expression
for a key-value pair is (?<ID>)=(?<VALUE>),?.
This type of regex does not match an entire log line. Therefore, addRegex() and
parseLine() need to change. Log lines indicating state properties are the
special case.
Original comment by ssukkerd
on 16 Jan 2013 at 6:59
This seems more complicated than necessary. Particularly because addRegex() and
parseLine() are already complex, so implementing this change will be difficult.
A simpler solution would be to parse a "state" string from a log line and later
assume that this string has a particular format. For example, you might parse
"x=1,y=2" as the state string, and later when you need to interpret the string
you can assume an (?<ID>)=(?<VALUE>),? kind of format.
Overall, my opinion is that we don't have to solve this perfectly at this time
-- it's just the first step in a long series of steps in this project. If you
spend two weeks on this step then you will have lost a lot of time.
Original comment by bestchai
on 16 Jan 2013 at 6:06
Solution in revision 96aa87414072, please review. I have made a few assumptions
about the log. You should see if you agree with them.
Original comment by ssukkerd
on 18 Jan 2013 at 6:53
I left some comments in the code and have four high level comments:
1. I think the two assumptions that you list in the code are okay. But, you
need to validate these and throw an exception when they do not hold (just as
you do when TYPE and STATE co-appear). I'm certain that we will make mistakes
and invalidate these every once in a while, so it would be nice to include
checks. You can create a separate issue for this or solve it in this issue.
2. I think the assumptions needs to be documented not just in the code, but
also on the wiki perhaps. This is probably a broader issue, since you also need
to document the regular expression assumptions/constraints and give a few usage
examples.
3. You will eventually want a separate data type to represent state. The
earlier you introduce it, the less code you'll have to rewrite later on. That
is, instead of maintaining a String type for preEventStateProperty and
postEventStateProperty, create and use a separate "State" type, which can
contain a "rawStringState" internally to represent the parsed state. This
probably needs a separate issue, as well.
4. Using EventNode to keep state is a hack and not the best solution. Add a
TODO and/or create an issue to refactor the parseLine and parseTrace methods to
eliminate this hack (i.e., ideally parseLine would return either an EventNode
or a State type described in (2) above).
Original comment by bestchai
on 18 Jan 2013 at 8:00
Original comment by ssukkerd
on 22 Jan 2013 at 7:51
I addressed all your comments in Revision 1f240b2ed125 except for creating a
separate data type to represent state. I create a new issue for this task.
Original comment by ssukkerd
on 22 Jan 2013 at 7:55
Looks good. Your changes raise a question for me -- does the prior assumption,
that "if a trace contains a state property, it must also contain at least 1
event" still hold true for the current code? It doesn't look that way to me.
That is, I don't see an exception of this kind, and you added a test called
parseTraceWithSingleState. So, did you decide to remove this assumption?
Original comment by bestchai
on 22 Jan 2013 at 6:28
Original comment by bestchai
on 22 Jan 2013 at 6:36
Issue 261 has been merged into this issue.
Original comment by bestchai
on 22 Jan 2013 at 6:40
Yes, I removed the assumption, because it seems like an edge case that should
be handled.
Original comment by ssukkerd
on 22 Jan 2013 at 6:53
Original comment by ssukkerd
on 22 Jan 2013 at 7:07
But, the result is that parseTrace() does not merge the singleton state with
any events, and therefore returns an EventNode instance with a null event. Your
unit test passes, but I expect that an integration test that attempts to run
the complete Synoptic pipeline on a log with a single state would throw a null
pointer exception.
I think you need to either bring back the assumption and throw a parse
exception, or return an EventNode that is usable by other code (which expects a
non-null event as part of an EventNode). I think the first solution is easier
-- I can't think of a synthetic Event instance that would make sense for a log
that doesn't have any events..
Original comment by bestchai
on 22 Jan 2013 at 7:22
I've reintroduced the assumption in Revision 7a509a546a2f. Please review and
merge this branch to default.
Original comment by ssukkerd
on 23 Jan 2013 at 1:15
Comments in the review.
Original comment by bestchai
on 23 Jan 2013 at 6:23
Solution in Revision 9db1fcce3b5e, please review and merge this branch to
default.
Original comment by ssukkerd
on 23 Jan 2013 at 7:06
Merged into default with revision ab2825466325
Original comment by bestchai
on 24 Jan 2013 at 8:22
Merged into default with revision ab2825466325
Original comment by bestchai
on 24 Jan 2013 at 8:22
Original issue reported on code.google.com by
ssukkerd
on 14 Jan 2013 at 8:03