Hughen / apachelog

Automatically exported from code.google.com/p/apachelog
0 stars 0 forks source link

%{Cookie} can contain \" and should be ignored #9

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Add a %{Cookie} to your CustomLog
2. Have a cookie with quotation marks
3. Try to use apachelog to parse the lines

Easiest fix:
Change findreferreragent = re.compile('Referer|User-Agent') to 
findreferreragent = re.compile('Referer|User-Agent|Cookie')

What is the expected output?
A parse-able line
What do you see instead?
Unparsable line

Please provide any additional information below.
Here's an example CustomLog line:
127.0.0.1 - - [31/Mar/2011:11:35:40 -0700] "GET /events HTTP/1.1" 200 103324 
"https://blah.com/core" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; 
rv:1.9.2.4) Gecko/20100611 Firefox/3.6.4" 11339 blah.com 80 6082970 
"tokens=blah; otherstuff=blerg; badstuff=\"hey look at my quotation marks\""

here's a test for you:
def testline5(self):
    data = self.x.parse(self.line5)
    self.assertEqual(data['%h'], '127.0.0.1', msg = 'Line 5 %h')
    self.assertEqual(data['%V'], 'blah.com', msg = 'Line 5 %V')
    self.assertEqual(data['%{Cookie}i'], 'tokens=blah; otherstuff=blerg; badstuff=\\"hey look at my quotation marks\\"', msg = 'Line 5 %{Cookie}i')

Original issue reported on code.google.com by julien.r...@gmail.com on 31 Mar 2011 at 7:08

GoogleCodeExporter commented 8 years ago
This problem seems to occur in other fields as well.  I have run across mobile 
user-agents with escaped double quotes \" that also throw an error. 

Here is an example: 

Unable to parse: www.example.com 119.30.38.61 - - [07/Mar/2012:03:28:32 +0000] 
"GET /some-url HTTP/1.1" 200 239 "-" "Nokia6151/2.0 (04.10) Profile/MIDP-2.0 
Configuration/CLDC-1.1x-wap-profile: 
\"http://nds1.nds.nokia.com/uaprof/N6151r400.xml\"" with the ^(\S*) (\S*) (\S*) 
(\S*) (\[[^\]]+\]) \"([^"\\]*(?:\\.[^"\\]*)*)\" (\S*) (\S*) 
\"([^"\\]*(?:\\.[^"\\]*)*)\" \"([^\"]*)\"$ regular expression

Original comment by jfxbe...@gmail.com on 8 Mar 2012 at 1:53

GoogleCodeExporter commented 8 years ago
Ahhh...  Issue #5 has a patch for user-agent with escaped double-quotes: 
http://code.google.com/p/apachelog/issues/detail?id=5

Original comment by jfxbe...@gmail.com on 8 Mar 2012 at 2:01