fluent / fluentd

Fluentd: Unified Logging Layer (project under CNCF)
https://www.fluentd.org
Apache License 2.0
12.84k stars 1.34k forks source link

problem on regexp parsing #820

Closed giammbo closed 8 years ago

giammbo commented 8 years ago

Hi all i'm receiving a lot of error for 2 type of tail:

The first:

2016-03-02 19:32:53 +0100 [warn]: temporarily failed to flush the buffer. next_retry=2016-03-02 19:32:54 +0100 error_class="InfluxDB::Error" error="{\"error\":\"partial write:\\nunable to parse 'nginx.access.log,daemon=nginx:,server=10.0.1.164,host=188.95.76.247,date=02/Mar/2016:18:32:40\\\\ +0000,httprequest=GET,request=/voucher/view/123621?pdf=1,protocol=HTTP/1.1,postrequest=-,statuscode=200,pagesize=49480,referrer=-,useragent=Amazon\\\\ CloudFront,renderizepage=0.814\\\\ 0.814\\\\ . htacces1=\\\"-\\\",htacces2=\\\"-\\\" 1456943562': invalid tag format\\nunable to parse 'nginx.access.log,daemon=nginx:,server=10.0.4.140,host=66.249.66.105,date=02/Mar/2016:18:32:41\\\\ +0000,httprequest=GET,request=/it/roma/tour-di-castelli-e-palazzi-c/?_escaped_fragment_=,protocol=HTTP/1.1,postrequest=-,statuscode=200,pagesize=16352,referrer=-,useragent=Amazon\\\\ CloudFront,renderizepage=0.660\\\\ 0.626\\\\ . htacces1=\\\"-\\\",htacces2=\\\"-\\\" 1456943563': invalid tag format\\nunable to parse 'nginx.access.log,daemon=nginx:,server=10.0.4.12,host=79.109.5.14,date=02/Mar/2016:18:32:51\\\\ +0000,httprequest=GET,request=/frontendV3/PROD/fonts/glyphicons-halflings-regular.woff,protocol=HTTP/1.1,postrequest=-,statuscode=200,pagesize=23292,referrer=https://www.musement.com/es/granada/alhambra-v/?gclid=CNC3yrzRossCFfMV0wod9XUIWw,useragent=Amazon\\\\ CloudFront,renderizepage=0.000\\\\ -\\\\ . htacces1=\\\"-\\\",htacces2=\\\"-\\\" 1456943573': invalid tag format\"}\n" plugin_id="object:3fbe5c5716c4"

the second:

2016-03-02 19:32:32 +0100 [warn]: failed to flush the buffer. error_class="InfluxDB::Error" error="{\"error\":\"unable to parse 'track.access.log,host=47.60.45.56,date=2016-03-02\\\\ 19:32:24\\\\ +0100\\\\ ,httprequest=GET,request=track.gif,env=prod\\u0026,url=https%253A%252F%252Fm.musement.com%252Fes%252Fgranada%252Falhambra-entradas-sin-colas-y-visita-guiada-por-la-manana-4072%252Fbooking,pagetype=frontend_mobile_event_booking,referrer=https%253A%252F%252Fm.musement.com%252Fes%252Fgranada%252Falhambra-entradas-sin-colas-y-visita-guiada-por-la-manana-4072%252F,locale=es,hitid=cb0bd867d219e6abd0796f0a6ba69e49-1456943543225,maid=,currency=EUR,session=ad00bc3mlg7a0vvqbh1jcp4tk1,countryId=161,cityId=174,venueId=,eventsId=4072,cart=,transactionId=,transactionTotal=,transactionMrgn=,transactionProducts=,customer=,eventName=event,eventValue=calendar-date,eventAttribute=open,httpprotocol=HTTP/1.1,httpcode=200,size=1443,cityname=Valencia,regionname=Comunidad\\\\ Valenciana,countryname=ES,referer=https://m.musement.com/es/granada/alhambra-entradas-sin-colas-y-visita-guiada-por-la-manana-4072/booking\\\\\\\",useragent=Mozilla/5.0\\\\ (Linux;\\\\ Android\\\\ 5.1.1;\\\\ ALE-L21\\\\ Build/HuaweiALE-L21)\\\\ AppleWebKit/537.36\\\\ (KHTML\\\\,\\\\ like\\\\ Gecko)\\\\ Chrome/48.0.2564.95\\\\ Mobile\\\\ Safari/537.36\\\\\\\" webid=\\\"cb0bd867d219e6abd0796f0a6ba69e49\\\",useragentFALSE=\\\"Mozilla/5.0%20(Linux;%20Android%205.1.1;%20ALE-L21%20Build/HuaweiALE-L21)%20AppleWebKit/537.36%20(KHTML\\\\,%20like%20Gecko)%20Chrome/48.0.2564.95%20Mobile%20Safari/537.36\\\" 1456943544': missing tag value\\nunable to parse 'track.access.log,host=37.14.34.97,date=2016-03-02\\\\ 19:32:25\\\\ +0100\\\\ ,httprequest=GET,request=track.gif,env=prod\\u0026,url=https%253A%252F%252Fwww.musement.com%252Fes%252Froma%252Fmuseos-vaticanos-v%252F%253Fgclid%253DCj0KEQiAu9q2BRDq3MDbvOL1yaYBEiQAD6qoBsu2fRcJDEwRwsNK7qQtYd6tJXSn1PxHDrKayAxLIEgaAh6z8P8HAQ%2526gclsrc%253Daw.ds,pagetype=frontend_venue,referrer=https%253A%252F%252Fwww.google.es%252F,locale=es,hitid=f154c32f575f635202e1d53351eb4fa1-1456943495070,maid=,currency=EUR,session=,countryId=82,cityId=2,venueId=164,eventsId=,cart=,transactionId=,transactionTotal=,transactionMrgn=,transactionProducts=,customer=,eventName=venue,eventValue=calendar-date,eventAttribute=open,httpprotocol=HTTP/1.1,httpcode=200,size=6474,cityname=Mislata,regionname=Comunidad\\\\ Valenciana,countryname=ES, [...] 

this is my config for the first error

<source>
type tail
  path /log/admin/access.log
  pos_file /log/td-agent/admin.access.log.pos
  tag nginx.access.log
  format /^(?<daemon>[^ ]*)\s*(?<server>[^ ]*)\s*(?<host>[^ ]*)\s*(?<htacces1>[^ ]*)\s*(?<htacces2>[^ ]*)\s*\[(?<date>[^ ]*\s\D[0-9]*)\]\s*\"(?<httprequest>[^ ]*)\s*(?<request>[^ ]*)\s*(?<protocol>[^ ]*)\"\s*\"(?<postrequest>.*?)\"\s*(?<statuscode>[^ ]*)\s*(?<pagesize>[^ ]*)\s*\"(?<referrer>[^ ]*)\"\s*\"(?<useragent>.*?)"\s*(?<renderizepage>.*)$/
  time_format %d/%b/%Y:%H:%M:%S %z
</source>
<match nginx.access.log.**>
  type influxdb
  host  pippo
  port  8086
  dbname access_log
  user  pippo
  password  pippo
  use_ssl false
  time_precision s
  tag_keys ["daemon", "server", "host", "htaccess1", "htaccess2", "date", "httprequest", "request", "protocol", "postrequest", "statuscode", "pagesize","referrer", "useragent", "renderizepage"]
  sequence_tag _seq
  flush_interval 10
  retry_limit 3
</match>

and this for the second:

<source>
    type tail
    path /log/track/access.log
    pos_file /log/td-agent/grafanatrack.access_log.pos
    tag track.access.log
    format /^(?<host>[^ ]*)[\s^-]*\s(?<date>[^ ]*\s[0-9]{1,2}:[0-9]{1,2}:[0-9]{1,2}\s*[\W0-9]{1,5}[^"]*)[\s^\W]*(?<httprequest>[^ ]*)\s*\W(?<request>[a-z]+\.[a-z]+)\?*env=(?<env>.*?)&;&url=(?<url>.*?)&;&page_type=(?<pagetype>.*?)&;&referrer=(?<referrer>.*?)&;&locale=(?<locale>.*?)&;&id=(?<webid>.*?)&;&hitid=(?<hitid>.*?)&;&maid=(?<maid>.*?)&;&agent=(?<useragentFALSE>.*?)&;&currency=(?<currency>.*?)&;&session=(?<session>.*?)&;&countryId=(?<countryId>.*?)&;&cityId=(?<cityId>.*?)&;&venueId=(?<venueId>.*?)&;&eventsId=(?<eventsId>.*?)&;&cart=(?<cart>.*?)&;&transactionId=(?<transactionId>.*?)&;&transactionTotal=(?<transactionTotal>.*?)&;&transactionMrgn=(?<transactionMrgn>.*?)&;&transactionProducts=(?<transactionProducts>.*?)&;&customer=(?<customer>.*?)&;&eventName=(?<eventName>.*?)&;&eventValue=(?<eventValue>.*?)&;&eventAttribute=(?<eventAttribute>[^ ]*)\s*(?<httpprotocol>[^" ]*)[\s\W]*(?<httpcode>[^ ]*)\s*(?<size>[^" ]*)[\s"]*(?<cityname>.*?)"\s"(?<regionname>.*?)"\s"(?<countryname>.*?)"\s"(?<referer>[^ ]*)\s*[\s^\W]*(?<useragent>.*)$/
    time_format %Y-%m-%d %H:%M:%S %z
</source>
<match track.access.log.**>
  type influxdb
  host pippo
  port  8086
  dbname tracking
  user  pippo
  password  pippo
  use_ssl false
  time_precision s
  tag_keys ["host", "date", "httprequest", "request", "env", "url", "pagetype", "referrer", "locale", "web_id", "hitid", "maid", "currency", "session", "countryId", "cityId", "venueId", "eventsId", "cart", "transactionId", "transactionTotal", "transactionMrgn", "transactionProducts", "customer", "eventName", "eventValue", "eventAttribute", "httpprotocol", "httpcode", "size", "cityname", "regionname", "countryname", "referer", "useragent"]
  sequence_tag _seq
  flush_interval 10
  retry_limit 0
</match>

This is the log received on the first source:

nginx: 10.0.4.140 88.181.150.136 - - [02/Mar/2016:18:41:19 +0000]  "GET /favicon.ico?random_1456909201 HTTP/1.1" "-" 200 5430 "https://www.musement.com/fr/barcelone/la-sagrada-familia-v/" "Amazon CloudFront" 0.000 - .

and this is the log received in the second:

62.19.79.55 - - 2016-03-02 19:42:11 +0100 "GET /track.gif?env=prod&&;&url=https%253A%252F%252Fwww.musement.com%252Fit%252Fvienna%252Fbiglietto-combinato-sissi-il-castello-di-schonbrunn-hofburg-e-il-museo-del-mobile-imperiale-2329%252F&;&page_type=frontend_event&;&referrer=https%253A%252F%252Fwww.google.it%252F&;&locale=it&;&id=5ed06fe6feeea5da65ce08f24335f2d7&;&hitid=5ed06fe6feeea5da65ce08f24335f2d7-1456944126582&;&maid=&;&agent=Mozilla/5.0%20(Windows%20NT%206.3;%20WOW64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/48.0.2564.116%20Safari/537.36&;&currency=EUR&;&session=&;&countryId=10&;&cityId=82&;&venueId=&;&eventsId=2329&;&cart=&;&transactionId=&;&transactionTotal=&;&transactionMrgn=&;&transactionProducts=&;&customer=&;&eventName=&;&eventValue=&;&eventAttribute= HTTP/1.1" 200 6469 "-" "-" "IT" "https://www.musement.com/it/vienna/biglietto-combinato-sissi-il-castello-di-schonbrunn-hofburg-e-il-museo-del-mobile-imperiale-2329/" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36"

why?

repeatedly commented 8 years ago

Is this a problem of influxdb plugin, right? If so, use fluent-plugin-influxdb plugin issue instead of fluentd repository.

repeatedly commented 8 years ago

And don't ignore issue template.