aboutsip / pkts

Pure java based pcap library capable of reading and writing to/from pcaps.
Other
196 stars 91 forks source link

Does case-insensitive parsing of SIP headers #55

Closed sbarta30 closed 7 years ago

sbarta30 commented 7 years ago

According to the RFC, SIP header names are not case sensitive; this changes the parsing to allow case-insensitive matching. It does not do anything about parsing of header values, which are supposed to be case-insensitive unless inside quotes; I haven't tested that and can't comment on it. I added test support for case-sensitivity in SIP header matching.

It will do some short-lived object creation during the matching process; if you're sensitive to that and would like to keep heap churn lower, I can modify the code, but if it's not important, I think the current implementation is a little clearer.

jonbo372 commented 7 years ago

Yeah, I'm slightly concerned about the toLower and toUpper since the parsing/framing must be super fast and consume as little memory as possible. Agree that this is more readable but opting for performance in this part of the stack. With that said, perhaps it doesn't matter but that is why I originally went with the "manual" approach everywhere...

jonbo372 commented 7 years ago

Also, where did you read that sip header names are not case sensitive? If you check the BNF in rfc3261, the known headers are very much case sensitive. See https://tools.ietf.org/html/rfc3261#page-224.

sbarta30 commented 7 years ago

The case-sensitivity is mentioned in p31 of the spec at that link, in 7.3.1.

I'm not entirely sure what I could do about the need to convert the buffers themselves to lowercase in the map of framers in SipParser.java. It all has to match since it's doing a map lookup. I could take take apart that data structure and store it a different way, where it might need O(n) lookup unless I can get clever.

In SipHeader.java, I could have it generate upper and lower-case versions of the strings once and make them static.

WDYT? I haven't done any benchmarking on this so don't have any solid numbers on what it does to performance.

jonbo372 commented 7 years ago

Ah cool, thanks, missed that.

I do have performance numbers and would need to retest, which will probably take a week or two but in short I think I would prefer the annoying manual process and very much the opposite of DRY and do this:

return (m.getByte(0) == 'F' || m.getByte(0) == 'f') && (m.getByte(1) == 'r' || m.getByte(1) == 'R') && (m.getByte(2) == 'o' || m.getByte(2) == 'O') && (m.getByte(3) == 'm' || m.getByte(3) == 'M');

sbarta30 commented 7 years ago

No problem. What do you want me to do about the buffer creation that SipParser.toLowerCase does for its case-insensitive matching? I could probably store the framers in a list and use Buffer.equalsIgnoreCase for the match, but it will take the constant-time map lookup and turn it into an O(n) lookup (though for a small and fixed value of n).

jonbo372 commented 7 years ago

I wonder, but may be an early optimization etc etc, if we should leave it as is and in the SipHeaderImpl.ensure where we try and find the parser, we will do a deeper analysis if we don't find it right away. Perhaps move the lookup into SipParser kind of like so:

public static Function<SipHeader, ? extends SipHeader>  findFramer(final Buffer name) {
Function<SipHeader, ? extends SipHeader> framer = SipParser.framers.get(name);
if (framer != null) {
    return framer;
}

return SipParser.framers.get(name.toLowerCase());
}

So then we only take the hit when implementations have header name in a case that most others doesn't. What do you think about that approach? Where the "deeper" could just be a toLowerCase and another lookup in the framers again. Hence, keep what you have in regards to a lower case version of all the names but you initially do not do the "case insensitive" lookup (via a toLowerCase) unless you have to.

sbarta30 commented 7 years ago

Of course, the do-quick-lookup-first and then do-slow-lookup-when-that-fails approach is good. Though with the improvements I've made in the slow lookup to have it minimize object creation, I wouldn't be surprised if the "slow" performance is actually pretty similar to the "quick" with the size of the framer map now. But this approach is less risky for sure.

This change does all that and makes some small but nice improvements in case-insensitive buffer equals, addressing a TODO you had in there.

jonbo372 commented 7 years ago

Thanks Scott!