ansible / awx

AWX provides a web-based user interface, REST API, and task engine built on top of Ansible. It is one of the upstream projects for Red Hat Ansible Automation Platform.
Other
14.06k stars 3.42k forks source link

[External Logger] Rsyslogd unexpectedly stop sending events to Splunk http collector #7560

Closed ziouf closed 11 months ago

ziouf commented 4 years ago
ISSUE TYPE
SUMMARY

Rsyslog daemon seam to unexpectedly stop publication of events to Splunk HTTP event collector.

ENVIRONMENT
STEPS TO REPRODUCE
EXPECTED RESULTS

Events published to Splunk HTTP event collector

ACTUAL RESULTS

I increased verbosity of logs by adding $DebugLevel 2 to /var/lib/awx/rsyslog/rsyslog.conf file. Like the following :

$DebugLevel 2
$WorkDirectory /var/lib/awx/rsyslog
$MaxMessageSize 700000
$IncludeConfig /var/lib/awx/rsyslog/conf.d/*.conf
main_queue(queue.spoolDirectory="/var/lib/awx" queue.maxdiskspace="1g" queue.type="Disk" queue.filename="awx-external-logger-backlog")
module(load="imuxsock" SysSock.Use="off")
input(type="imuxsock" Socket="/var/run/awx-rsyslog/rsyslog.sock" unlink="on")
template(name="awx" type="string" string="%rawmsg-after-pri%")
module(load="omhttp")
action(type="omhttp" server="xxx" serverport="8088" usehttps="on" allowunsignedcerts="on" skipverifyhost="on" action.resumeRetryCount="-1" template="awx" errorfile="/var/log/tower/rsyslog.err" action.resumeInterval="10" restpath="services/collector/event" httpheaderkey="Authorization" httpheadervalue="Splunk xxx")

And then restarted rsyslogd daemon by running:

docker container exec -ti awx_web supervisorctl -c /supervisor.conf restart tower-processes:awx-rsyslogd
ADDITIONAL INFORMATION

I get the following logs with 2020-07-07T09:30:38.645Z as the very last timestamp processed by Splunk. It seams that next events fail to publish because we can see that omHttp try commit action return non-zero code. But I don't see the root cause in logs.

...
2020-07-07T09:30:38.692724000Z  4238.685265456:imuxsock.c     : imuxsock.c: Message from UNIX socket: #4, size 2144
2020-07-07T09:30:38.692933000Z  4238.685275329:imuxsock.c     : parser.c: dropped LF at very end of message (DropTrailingLF is set)
2020-07-07T09:30:38.693141000Z  4238.685279952:imuxsock.c     : datetime.c: ParseTIMESTAMP3339: invalid year: 0, pszTS: '"'
2020-07-07T09:30:38.693357000Z  4238.685284023:imuxsock.c     : ruleset.c: processBATCH: batch of 1 elements must be processed
2020-07-07T09:30:38.693566000Z  4238.685286499:imuxsock.c     : ruleset.c: processBATCH: next msg 0: <14>{"event": {"@timestamp": "2020-07-07T09:30:38.645Z", "me
ssage": "Event data saved.", "host": 149, "level": "INFO", "logger_n
2020-07-07T09:30:38.693788000Z  4238.685289404:imuxsock.c     : rainerscript.c:     ACTION 0 [omhttp:action(type="omhttp" ...)]
2020-07-07T09:30:38.693998000Z  4238.685295419:imuxsock.c     : ruleset.c: executing action 0
2020-07-07T09:30:38.694215000Z  4238.685298043:imuxsock.c     : ../action.c: action 'action-0-omhttp': called, logging to omhttp (susp 0/0, direct q 1)
2020-07-07T09:30:38.694425000Z  4238.685301860:imuxsock.c     : ../action.c: action 'action-0-omhttp': is transactional - executing in commit phase
2020-07-07T09:30:38.694640000Z  4238.685304435:imuxsock.c     : ../action.c: actionPrepare[action-0-omhttp]: enter
2020-07-07T09:30:38.694857000Z  4238.685306679:imuxsock.c     : ../action.c: checking external state file
2020-07-07T09:30:38.695066000Z  4238.685308782:imuxsock.c     : ../action.c: done checking external state file, iRet=0
2020-07-07T09:31:00.031976000Z  RESULT 2
2020-07-07T09:31:00.032297000Z  OKREADY
2020-07-07T09:31:07.082980000Z  4267.081868585:imuxsock.c     : imuxsock.c: Message from UNIX socket: #4, size 1236
2020-07-07T09:31:07.083269000Z  4267.081886907:imuxsock.c     : parser.c: dropped LF at very end of message (DropTrailingLF is set)
2020-07-07T09:31:07.083499000Z  4267.081891938:imuxsock.c     : datetime.c: ParseTIMESTAMP3339: invalid year: 0, pszTS: '"'
2020-07-07T09:31:07.083730000Z  4267.081898096:imuxsock.c     : ruleset.c: processBATCH: batch of 1 elements must be processed
2020-07-07T09:31:07.083950000Z  4267.081901096:imuxsock.c     : ruleset.c: processBATCH: next msg 0: <14>{"event": {"@timestamp": "2020-07-07T09:31:07.080Z", "me
ssage": "Event data saved.", "host": 36, "level": "INFO", "logger_na
2020-07-07T09:31:07.084173000Z  4267.081904611:imuxsock.c     : rainerscript.c:     ACTION 0 [omhttp:action(type="omhttp" ...)]
2020-07-07T09:31:07.084389000Z  4267.081911077:imuxsock.c     : ruleset.c: executing action 0
2020-07-07T09:31:07.084602000Z  4267.081914375:imuxsock.c     : ../action.c: action 'action-0-omhttp': called, logging to omhttp (susp 0/0, direct q 1)
2020-07-07T09:31:07.084825000Z  4267.081919046:imuxsock.c     : ../action.c: action 'action-0-omhttp': is transactional - executing in commit phase
2020-07-07T09:31:07.085040000Z  4267.081922051:imuxsock.c     : ../action.c: actionPrepare[action-0-omhttp]: enter
2020-07-07T09:31:07.085251000Z  4267.081924724:imuxsock.c     : ../action.c: action 'action-0-omhttp': set suspended state to 0
2020-07-07T09:31:07.085470000Z  4267.081927362:imuxsock.c     : ruleset.c: END batch execution phase, entering to commit phase [processed 1 of 1 messages]
2020-07-07T09:31:07.085680000Z  4267.081930072:imuxsock.c     : ../action.c: actionCommitAllDirect: action 0, state 5, nbr to commit 1 isTransactional 1
2020-07-07T09:31:07.085904000Z  4267.081932513:imuxsock.c     : ../action.c: actionCommit[action-0-omhttp]: enter, 1 msgs
2020-07-07T09:31:07.086140000Z  4267.081934637:imuxsock.c     : ../action.c: actionCommit[action-0-omhttp]: processing...
2020-07-07T09:31:07.086364000Z  4267.081936725:imuxsock.c     : ../action.c: actionTryCommit[action-0-omhttp] enter
2020-07-07T09:31:07.086578000Z  4267.081938889:imuxsock.c     : ../action.c: actionPrepare[action-0-omhttp]: enter
2020-07-07T09:31:07.086804000Z  4267.081941372:imuxsock.c     : ../action.c: doTransaction: action 'action-0-omhttp', currIParam 1
2020-07-07T09:31:07.087017000Z  4267.081943473:imuxsock.c     : ../action.c: actionPrepare[action-0-omhttp]: enter
2020-07-07T09:31:07.087235000Z  4267.081945859:imuxsock.c     : ../action.c: doTransaction: action 0, processing msg 0, result -2218
2020-07-07T09:31:07.087458000Z  4267.081948059:imuxsock.c     : ../action.c: actionCommit[action-0-omhttp]: return actionTryCommit -2218
2020-07-07T09:31:07.087669000Z  4267.081950261:imuxsock.c     : ../action.c: actionCommit[action-0-omhttp]: done, iRet -2218
2020-07-07T09:31:07.090779000Z  4267.081952391:imuxsock.c     : ruleset.c: processBATCH: batch of 1 elements has been processed2020-07-07T09:31:07.091045000Z  4267.081957182:imuxsock.c     : imuxsock.c: --------imuxsock calling poll() on 2 fds
2020-07-07T09:31:07.098515000Z  4267.097736092:imuxsock.c     : imuxsock.c: Message from UNIX socket: #4, size 1911
2020-07-07T09:31:07.098803000Z  4267.097756048:imuxsock.c     : parser.c: dropped LF at very end of message (DropTrailingLF is set)
2020-07-07T09:31:07.099032000Z  4267.097761546:imuxsock.c     : datetime.c: ParseTIMESTAMP3339: invalid year: 0, pszTS: '"'
2020-07-07T09:31:07.099255000Z  4267.097767933:imuxsock.c     : ruleset.c: processBATCH: batch of 1 elements must be processed
2020-07-07T09:31:07.099472000Z  4267.097771126:imuxsock.c     : ruleset.c: processBATCH: next msg 0: <14>{"event": {"@timestamp": "2020-07-07T09:31:07.093Z", "message": "Event data saved.", "host": 670, "level": "INFO", "logger_n
2020-07-07T09:31:07.099712000Z  4267.097774770:imuxsock.c     : rainerscript.c:     ACTION 0 [omhttp:action(type="omhttp" ...)]
2020-07-07T09:31:07.099932000Z  4267.097781233:imuxsock.c     : ruleset.c: executing action 0
2020-07-07T09:31:07.100151000Z  4267.097784508:imuxsock.c     : ../action.c: action 'action-0-omhttp': called, logging to omhttp (susp 0/0, direct q 1)
2020-07-07T09:31:07.100367000Z  4267.097789530:imuxsock.c     : ../action.c: action 'action-0-omhttp': is transactional - executing in commit phase
2020-07-07T09:31:07.100580000Z  4267.097792267:imuxsock.c     : ../action.c: actionPrepare[action-0-omhttp]: enter
2020-07-07T09:31:07.100801000Z  4267.097795152:imuxsock.c     : ../action.c: action 'action-0-omhttp': set suspended state to 0
2020-07-07T09:31:07.101020000Z  4267.097797424:imuxsock.c     : ruleset.c: END batch execution phase, entering to commit phase [processed 1 of 1 messages]
2020-07-07T09:31:07.101231000Z  4267.097800529:imuxsock.c     : ../action.c: actionCommitAllDirect: action 0, state 5, nbr to commit 1 isTransactional 1
2020-07-07T09:31:07.101456000Z  4267.097803055:imuxsock.c     : ../action.c: actionCommit[action-0-omhttp]: enter, 1 msgs
2020-07-07T09:31:07.101669000Z  4267.097805090:imuxsock.c     : ../action.c: actionCommit[action-0-omhttp]: processing...
2020-07-07T09:31:07.101937000Z  4267.097807151:imuxsock.c     : ../action.c: actionTryCommit[action-0-omhttp] enter
2020-07-07T09:31:07.102149000Z  4267.097809295:imuxsock.c     : ../action.c: actionPrepare[action-0-omhttp]: enter
2020-07-07T09:31:07.102364000Z  4267.097811763:imuxsock.c     : ../action.c: doTransaction: action 'action-0-omhttp', currIParam 1
2020-07-07T09:31:07.102572000Z  4267.097813873:imuxsock.c     : ../action.c: actionPrepare[action-0-omhttp]: enter
2020-07-07T09:31:07.102790000Z  4267.097816217:imuxsock.c     : ../action.c: doTransaction: action 0, processing msg 0, result -2218
2020-07-07T09:31:07.103001000Z  4267.097818403:imuxsock.c     : ../action.c: actionCommit[action-0-omhttp]: return actionTryCommit -2218
2020-07-07T09:31:07.103215000Z  4267.097820562:imuxsock.c     : ../action.c: actionCommit[action-0-omhttp]: done, iRet -2218
2020-07-07T09:31:07.103433000Z  4267.097822641:imuxsock.c     : ruleset.c: processBATCH: batch of 1 elements has been processed
2020-07-07T09:31:07.103651000Z  4267.097827025:imuxsock.c     : imuxsock.c: --------imuxsock calling poll() on 2 fds
...

It somtimes works again after a manual trigger of Rsyslogd restart with this cmd, but never for a long time...

docker container exec -ti awx_web supervisorctl -c /supervisor.conf restart tower-processes:awx-rsyslogd

Can you help me to identify and fix the root cause ? Thank you

Edit: I found a first clue : Rsyslog error code 2218 is about failure caused by data passed to action. (https://www.rsyslog.com/rsyslog-error-2218/) Maybe we can dig in this direction ?

ryanpetrello commented 4 years ago

@ziouf,

This is pretty weird - it might be worth opening an issue at https://github.com/rsyslog/rsyslog/issues/ with this output as well to see if they have any ideas/pointers (or they might have suggestions about how to get more detail about the failure).

ziouf commented 4 years ago

Issue opened in rsyslog project issue tracker : https://github.com/rsyslog/rsyslog/issues/4348

ziouf commented 4 years ago

https://github.com/rsyslog/rsyslog/issues/4348#issuecomment-655792849

Possible that your message is not valid JSON (pulled the below from the debug log, unless it's getting cropped?) and the server you're sending to is throwing an error. Since you have action.resumeRetryCount="-1", this is getting retried (and keeps failing) forever, so you're stuck in a suspend loop.

2020-07-07T09:31:07.083950000Z  4267.081901096:imuxsock.c     : ruleset.c: processBATCH: next msg 0: <14>{"event": {"@timestamp": "2020-07-07T09:31:07.080Z", "me
ssage": "Event data saved.", "host": 36, "level": "INFO", "logger_na

Can you try to verify that your messages are all valid JSON? Or even better, use a template format that escapes your message and let rsyslog produce valid JSON?

@ryanpetrello Is it possible that AWX exposes a bad formatted json to rsyslog ? How can I verify json format of blocking event ?

What do you think of the template suggestion?

ziouf commented 4 years ago

I have found events in error in /var/log/tower/rsyslog.err file

The last one is :

{
  "request": {
    "url": "https://splunkhost.com:8088/services/collector/event",
    "postdata": "{\"event\": {\"@timestamp\": \"2020-07-07T09:30:53.780Z\", \"message\": \"Event data saved.\", \"host\": 686, \"level\": \"INFO\", \"logger_name\": \"awx.analytics.job_events\", \"id\": null, \"modified\": null, \"event\": \"runner_on_failed\", \"event_data\": {\"playbook\": \"AWX_Ansible/ansible_check_setup.yml\", \"playbook_uuid\": \"a7eb8878-345b-4237-bfce-9e5d18b0a9e9\", \"play\": \"Playbook pour test setup\", \"play_uuid\": \"0242ac12-0006-80ff-50f5-0000000008b9\", \"play_pattern\": \"all\", \"task\": \"get_ansible_facts\", \"task_uuid\": \"0242ac12-0006-80ff-50f5-0000000008bb\", \"task_action\": \"setup\", \"task_args\": \"\", \"task_path\": \"/tmp/awx_1440_enxdule_/project/AWX_Ansible/ansible_check_setup.yml:7\", \"host\": \"hostname.com\", \"remote_addr\": \"hostname.com\", \"res\": {\"msg\": \"winrm send_input failed; \\nstdout: \\nstderr W\\u0000i\\u0000n\\u0000d\\u0000o\\u0000w\\u0000s\\u0000 \\u0000P\\u0000o\\u0000w\\u0000e\\u0000r\\u0000S\\u0000h\\u0000e\\u0000l\\u0000l\\u0000 \\u0000s\\u0000'\\u0000e\\u0000s\\u0000t\\u0000 \\u0000a\\u0000r\\u0000r\\u0000\\udcea\\u0000t\\u0000\\udce9\\u0000 \\u0000a\\u0000v\\u0000e\\u0000c\\u0000 \\u0000l\\u0000'\\u0000e\\u0000r\\u0000r\\u0000e\\u0000u\\u0000r\\u0000 \\u0000s\\u0000u\\u0000i\\u0000v\\u0000a\\u0000n\\u0000t\\u0000e\\u0000\\udca0\\u0000:\\u0000 \\u0000\\r\\u0000\\n\\u0000 \\u0000I\\u0000m\\u0000p\\u0000o\\u0000s\\u0000s\\u0000i\\u0000b\\u0000l\\u0000e\\u0000 \\u0000d\\u0000e\\u0000 \\u0000c\\u0000h\\u0000a\\u0000r\\u0000g\\u0000e\\u0000r\\u0000 \\u0000l\\u0000e\\u0000 \\u0000f\\u0000i\\u0000c\\u0000h\\u0000i\\u0000e\\u0000r\\u0000 \\u0000o\\u0000u\\u0000 \\u0000l\\u0000'\\u0000a\\u0000s\\u0000s\\u0000e\\u0000m\\u0000b\\u0000l\\u0000y\\u0000 \\u0000'\\u0000S\\u0000y\\u0000s\\u0000t\\u0000e\\u0000m\\u0000.\\u0000M\\u0000a\\u0000n\\u0000a\\u0000g\\u0000e\\u0000m\\u0000e\\u0000n\\u0000t\\u0000.\\u0000A\\u0000u\\u0000t\\u0000o\\u0000m\\u0000a\\u0000t\\u0000i\\u0000o\\u0000n\\u0000,\\u0000 \\u0000V\\u0000e\\u0000r\\u0000s\\u0000i\\u0000o\\u0000n\\u0000=\\u00003\\u0000.\\u00000\\u0000.\\u00000\\u0000.\\u00000\\u0000,\\u0000 \\u0000C\\u0000u\\u0000l\\u0000t\\u0000u\\u0000r\\u0000e\\u0000=\\u0000n\\u0000e\\u0000u\\u0000t\\u0000r\\u0000a\\u0000l\\u0000,\\u0000 \\u0000P\\u0000u\\u0000b\\u0000l\\u0000i\\u0000c\\u0000K\\u0000e\\u0000y\\u0000T\\u0000o\\u0000k\\u0000e\\u0000n\\u0000=\\u00003\\u00001\\u0000b\\u0000f\\u00003\\u00008\\u00005\\u00006\\u0000a\\u0000d\\u00003\\u00006\\u00004\\u0000e\\u00003\\u00005\\u0000'\\u0000 \\u0000o\\u0000u\\u0000 \\u0000u\\u0000n\\u0000e\\u0000 \\u0000d\\u0000e\\u0000 \\u0000s\\u0000e\\u0000s\\u0000 \\u0000d\\u0000\\udce9\\u0000p\\u0000e\\u0000n\\u0000d\\u0000a\\u0000n\\u0000c\\u0000e\\u0000s\\u0000.\\u0000 \\u0000L\\u0000e\\u0000 \\u0000f\\u0000i\\u0000c\\u0000h\\u0000i\\u0000e\\u0000r\\u0000 \\u0000d\\u0000e\\u0000 \\u0000p\\u0000a\\u0000g\\u0000i\\u0000n\\u0000a\\u0000t\\u0000i\\u0000o\\u0000n\\u0000 \\u0000e\\u0000s\\u0000t\\u0000 \\u0000i\\u0000n\\u0000s\\u0000u\\u0000f\\u0000f\\u0000i\\u0000s\\u0000a\\u0000n\\u0000t\\u0000 \\u0000p\\u0000o\\u0000u\\u0000r\\u0000 \\u0000t\\u0000e\\u0000r\\u0000m\\u0000i\\u0000n\\u0000e\\u0000r\\u0000 \\u0000c\\u0000e\\u0000t\\u0000t\\u0000e\\u0000 \\u0000o\\u0000p\\u0000\\udce9\\u0000r\\u0000a\\u0000t\\u0000i\\u0000o\\u0000n\\u0000.\\u0000 \\u0000(\\u0000E\\u0000x\\u0000c\\u0000e\\u0000p\\u0000t\\u0000i\\u0000o\\u0000n\\u0000 \\u0000d\\u0000e\\u0000 \\u0000H\\u0000R\\u0000E\\u0000S\\u0000U\\u0000L\\u0000T\\u0000 \\u0000:\\u0000 \\u00000\\u0000x\\u00008\\u00000\\u00000\\u00007\\u00000\\u00005\\u0000A\\u0000F\\u0000)\\u0000\\r\\u0000\\n\\u0000\", \"_ansible_no_log\": false}, \"start\": \"2020-07-07T09:30:52.679120\", \"end\": \"2020-07-07T09:30:53.679388\", \"duration\": 1.000268, \"ignore_errors\": null, \"event_loop\": null, \"uuid\": \"802a661d-71a7-499d-aa0e-a280234a94c9\"}, \"failed\": true, \"changed\": false, \"uuid\": \"802a661d-71a7-499d-aa0e-a280234a94c9\", \"playbook\": \"AWX_Ansible/ansible_check_setup.yml\", \"play\": \"Playbook pour test setup\", \"role\": \"\", \"task\": \"get_ansible_facts\", \"counter\": 416, \"stdout\": \"\\u001b[0;31mfatal: [hostname.com]: FAILED! => {\\\"msg\\\": \\\"winrm send_input failed; \\\\nstdout: \\\\nstderr W\\\\u0000i\\\\u0000n\\\\u0000d\\\\u0000o\\\\u0000w\\\\u0000s\\\\u0000 \\\\u0000P\\\\u0000o\\\\u0000w\\\\u0000e\\\\u0000r\\\\u0000S\\\\u0000h\\\\u0000e\\\\u0000l\\\\u0000l\\\\u0000 \\\\u0000s\\\\u0000'\\\\u0000e\\\\u0000s\\\\u0000t\\\\u0000 \\\\u0000a\\\\u0000r\\\\u0000r\\\\u0000\\ufffd\\\\u0000t\\\\u0000\\ufffd\\\\u0000 \\\\u0000a\\\\u0000v\\\\u0000e\\\\u0000c\\\\u0000 \\\\u0000l\\\\u0000'\\\\u0000e\\\\u0000r\\\\u0000r\\\\u0000e\\\\u0000u\\\\u0000r\\\\u0000 \\\\u0000s\\\\u0000u\\\\u0000i\\\\u0000v\\\\u0000a\\\\u0000n\\\\u0000t\\\\u0000e\\\\u0000\\ufffd\\\\u0000:\\\\u0000 \\\\u0000\\\\r\\\\u0000\\\\n\\\\u0000 \\\\u0000I\\\\u0000m\\\\u0000p\\\\u0000o\\\\u0000s\\\\u0000s\\\\u0000i\\\\u0000b\\\\u0000l\\\\u0000e\\\\u0000 \\\\u0000d\\\\u0000e\\\\u0000 \\\\u0000c\\\\u0000h\\\\u0000a\\\\u0000r\\\\u0000g\\\\u0000e\\\\u0000r\\\\u0000 \\\\u0000l\\\\u0000e\\\\u0000 \\\\u0000f\\\\u0000i\\\\u0000c\\\\u0000h\\\\u0000i\\\\u0000e\\\\u0000r\\\\u0000 \\\\u0000o\\\\u0000u\\\\u0000 \\\\u0000l\\\\u0000'\\\\u0000a\\\\u0000s\\\\u0000s\\\\u0000e\\\\u0000m\\\\u0000b\\\\u0000l\\\\u0000y\\\\u0000 \\\\u0000'\\\\u0000S\\\\u0000y\\\\u0000s\\\\u0000t\\\\u0000e\\\\u0000m\\\\u0000.\\\\u0000M\\\\u0000a\\\\u0000n\\\\u0000a\\\\u0000g\\\\u0000e\\\\u0000m\\\\u0000e\\\\u0000n\\\\u0000t\\\\u0000.\\\\u0000A\\\\u0000u\\\\u0000t\\\\u0000o\\\\u0000m\\\\u0000a\\\\u0000t\\\\u0000i\\\\u0000o\\\\u0000n\\\\u0000,\\\\u0000 \\\\u0000V\\\\u0000e\\\\u0000r\\\\u0000s\\\\u0000i\\\\u0000o\\\\u0000n\\\\u0000=\\\\u00003\\\\u0000.\\\\u00000\\\\u0000.\\\\u00000\\\\u0000.\\\\u00000\\\\u0000,\\\\u0000 \\\\u0000C\\\\u0000u\\\\u0000l\\\\u0000t\\\\u0000u\\\\u0000r\\\\u0000e\\\\u0000=\\\\u0000n\\\\u0000e\\\\u0000u\\\\u0000t\\\\u0000r\\\\u0000a\\\\u0000l\\\\u0000,\\\\u0000 \\\\u0000P\\\\u0000u\\\\u0000b\\\\u0000l\\\\u0000i\\\\u0000c\\\\u0000K\\\\u0000e\\\\u0000y\\\\u0000T\\\\u0000o\\\\u0000k\\\\u0000e\\\\u0000n\\\\u0000=\\\\u00003\\\\u00001\\\\u0000b\\\\u0000f\\\\u00003\\\\u00008\\\\u00005\\\\u00006\\\\u0000a\\\\u0000d\\\\u00003\\\\u00006\\\\u00004\\\\u0000e\\\\u00003\\\\u00005\\\\u0000'\\\\u0000 \\\\u0000o\\\\u0000u\\\\u0000 \\\\u0000u\\\\u0000n\\\\u0000e\\\\u0000 \\\\u0000d\\\\u0000e\\\\u0000 \\\\u0000s\\\\u0000e\\\\u0000s\\\\u0000 \\\\u0000d\\\\u0000\\ufffd\\\\u0000p\\\\u0000e\\\\u0000n\\\\u0000d\\\\u0000a\\\\u0000n\\\\u0000c\\\\u0000e\\\\u0000s\\\\u0000.\\\\u0000 \\\\u0000L\\\\u0000e\\\\u0000 \\\\u0000f\\\\u0000i\\\\u0000c\\\\u0000h\\\\u0000i\\\\u0000e\\\\u0000r\\\\u0000 \\\\u0000d\\\\u0000e\\\\u0000 \\\\u0000p\\\\u0000a\\\\u0000g\\\\u0000i\\\\u0000n\\\\u0000a\\\\u0000t\\\\u0000i\\\\u0000o\\\\u0000n\\\\u0000 \\\\u0000e\\\\u0000s\\\\u0000t\\\\u0000 \\\\u0000i\\\\u0000n\\\\u0000s\\\\u0000u\\\\u0000f\\\\u0000f\\\\u0000i\\\\u0000s\\\\u0000a\\\\u0000n\\\\u0000t\\\\u0000 \\\\u0000p\\\\u0000o\\\\u0000u\\\\u0000r\\\\u0000 \\\\u0000t\\\\u0000e\\\\u0000r\\\\u0000m\\\\u0000i\\\\u0000n\\\\u0000e\\\\u0000r\\\\u0000 \\\\u0000c\\\\u0000e\\\\u0000t\\\\u0000t\\\\u0000e\\\\u0000 \\\\u0000o\\\\u0000p\\\\u0000\\ufffd\\\\u0000r\\\\u0000a\\\\u0000t\\\\u0000i\\\\u0000o\\\\u0000n\\\\u0000.\\\\u0000 \\\\u0000(\\\\u0000E\\\\u0000x\\\\u0000c\\\\u0000e\\\\u0000p\\\\u0000t\\\\u0000i\\\\u0000o\\\\u0000n\\\\u0000 \\\\u0000d\\\\u0000e\\\\u0000 \\\\u0000H\\\\u0000R\\\\u0000E\\\\u0000S\\\\u0000U\\\\u0000L\\\\u0000T\\\\u0000 \\\\u0000:\\\\u0000 \\\\u00000\\\\u0000x\\\\u00008\\\\u00000\\\\u00000\\\\u00007\\\\u00000\\\\u00005\\\\u0000A\\\\u0000F\\\\u0000)\\\\u0000\\\\r\\\\u0000\\\\n\\\\u0000\\\"}\\u001b[0m\", \"verbosity\": 0, \"start_line\": 207, \"end_line\": 208, \"created\": \"2020-07-07T09:30:53.679Z\", \"job\": 1440, \"host_name\": \"hostname.com\", \"parent_uuid\": \"0242ac12-0006-80ff-50f5-0000000008bb\", \"event_display\": \"Host Failed\", \"cluster_host_id\": \"awx\", \"tower_uuid\": null}}"
  },
  "response": {
    "status": 400,
    "message": "{\"text\":\"Invalid data format\",\"code\":6,\"invalid-event-number\":0}"
  }
}

postdata field content:

{
    "event": {
        "@timestamp": "2020-07-07T09:30:53.780Z",
        "message": "Event data saved.",
        "host": 686,
        "level": "INFO",
        "logger_name": "awx.analytics.job_events",
        "id": null,
        "modified": null,
        "event": "runner_on_failed",
        "event_data": {
            "playbook": "AWX_Ansible/ansible_check_setup.yml",
            "playbook_uuid": "a7eb8878-345b-4237-bfce-9e5d18b0a9e9",
            "play": "Playbook pour test setup",
            "play_uuid": "0242ac12-0006-80ff-50f5-0000000008b9",
            "play_pattern": "all",
            "task": "get_ansible_facts",
            "task_uuid": "0242ac12-0006-80ff-50f5-0000000008bb",
            "task_action": "setup",
            "task_args": "",
            "task_path": "/tmp/awx_1440_enxdule_/project/AWX_Ansible/ansible_check_setup.yml:7",
            "host": "hostname.com",
            "remote_addr": "hostname.com",
            "res": {
                "msg": "winrm send_input failed; \nstdout: \nstderr W\u0000i\u0000n\u0000d\u0000o\u0000w\u0000s\u0000 \u0000P\u0000o\u0000w\u0000e\u0000r\u0000S\u0000h\u0000e\u0000l\u0000l\u0000 \u0000s\u0000'\u0000e\u0000s\u0000t\u0000 \u0000a\u0000r\u0000r\u0000\udcea\u0000t\u0000\udce9\u0000 \u0000a\u0000v\u0000e\u0000c\u0000 \u0000l\u0000'\u0000e\u0000r\u0000r\u0000e\u0000u\u0000r\u0000 \u0000s\u0000u\u0000i\u0000v\u0000a\u0000n\u0000t\u0000e\u0000\udca0\u0000:\u0000 \u0000\r\u0000\n\u0000 \u0000I\u0000m\u0000p\u0000o\u0000s\u0000s\u0000i\u0000b\u0000l\u0000e\u0000 \u0000d\u0000e\u0000 \u0000c\u0000h\u0000a\u0000r\u0000g\u0000e\u0000r\u0000 \u0000l\u0000e\u0000 \u0000f\u0000i\u0000c\u0000h\u0000i\u0000e\u0000r\u0000 \u0000o\u0000u\u0000 \u0000l\u0000'\u0000a\u0000s\u0000s\u0000e\u0000m\u0000b\u0000l\u0000y\u0000 \u0000'\u0000S\u0000y\u0000s\u0000t\u0000e\u0000m\u0000.\u0000M\u0000a\u0000n\u0000a\u0000g\u0000e\u0000m\u0000e\u0000n\u0000t\u0000.\u0000A\u0000u\u0000t\u0000o\u0000m\u0000a\u0000t\u0000i\u0000o\u0000n\u0000,\u0000 \u0000V\u0000e\u0000r\u0000s\u0000i\u0000o\u0000n\u0000=\u00003\u0000.\u00000\u0000.\u00000\u0000.\u00000\u0000,\u0000 \u0000C\u0000u\u0000l\u0000t\u0000u\u0000r\u0000e\u0000=\u0000n\u0000e\u0000u\u0000t\u0000r\u0000a\u0000l\u0000,\u0000 \u0000P\u0000u\u0000b\u0000l\u0000i\u0000c\u0000K\u0000e\u0000y\u0000T\u0000o\u0000k\u0000e\u0000n\u0000=\u00003\u00001\u0000b\u0000f\u00003\u00008\u00005\u00006\u0000a\u0000d\u00003\u00006\u00004\u0000e\u00003\u00005\u0000'\u0000 \u0000o\u0000u\u0000 \u0000u\u0000n\u0000e\u0000 \u0000d\u0000e\u0000 \u0000s\u0000e\u0000s\u0000 \u0000d\u0000\udce9\u0000p\u0000e\u0000n\u0000d\u0000a\u0000n\u0000c\u0000e\u0000s\u0000.\u0000 \u0000L\u0000e\u0000 \u0000f\u0000i\u0000c\u0000h\u0000i\u0000e\u0000r\u0000 \u0000d\u0000e\u0000 \u0000p\u0000a\u0000g\u0000i\u0000n\u0000a\u0000t\u0000i\u0000o\u0000n\u0000 \u0000e\u0000s\u0000t\u0000 \u0000i\u0000n\u0000s\u0000u\u0000f\u0000f\u0000i\u0000s\u0000a\u0000n\u0000t\u0000 \u0000p\u0000o\u0000u\u0000r\u0000 \u0000t\u0000e\u0000r\u0000m\u0000i\u0000n\u0000e\u0000r\u0000 \u0000c\u0000e\u0000t\u0000t\u0000e\u0000 \u0000o\u0000p\u0000\udce9\u0000r\u0000a\u0000t\u0000i\u0000o\u0000n\u0000.\u0000 \u0000(\u0000E\u0000x\u0000c\u0000e\u0000p\u0000t\u0000i\u0000o\u0000n\u0000 \u0000d\u0000e\u0000 \u0000H\u0000R\u0000E\u0000S\u0000U\u0000L\u0000T\u0000 \u0000:\u0000 \u00000\u0000x\u00008\u00000\u00000\u00007\u00000\u00005\u0000A\u0000F\u0000)\u0000\r\u0000\n\u0000",
                "_ansible_no_log": false
            },
            "start": "2020-07-07T09:30:52.679120",
            "end": "2020-07-07T09:30:53.679388",
            "duration": 1.000268,
            "ignore_errors": null,
            "event_loop": null,
            "uuid": "802a661d-71a7-499d-aa0e-a280234a94c9"
        },
        "failed": true,
        "changed": false,
        "uuid": "802a661d-71a7-499d-aa0e-a280234a94c9",
        "playbook": "AWX_Ansible/ansible_check_setup.yml",
        "play": "Playbook pour test setup",
        "role": "",
        "task": "get_ansible_facts",
        "counter": 416,
        "stdout": "\u001b[0;31mfatal: [hostname.com]: FAILED! => {\"msg\": \"winrm send_input failed; \\nstdout: \\nstderr W\\u0000i\\u0000n\\u0000d\\u0000o\\u0000w\\u0000s\\u0000 \\u0000P\\u0000o\\u0000w\\u0000e\\u0000r\\u0000S\\u0000h\\u0000e\\u0000l\\u0000l\\u0000 \\u0000s\\u0000'\\u0000e\\u0000s\\u0000t\\u0000 \\u0000a\\u0000r\\u0000r\\u0000\ufffd\\u0000t\\u0000\ufffd\\u0000 \\u0000a\\u0000v\\u0000e\\u0000c\\u0000 \\u0000l\\u0000'\\u0000e\\u0000r\\u0000r\\u0000e\\u0000u\\u0000r\\u0000 \\u0000s\\u0000u\\u0000i\\u0000v\\u0000a\\u0000n\\u0000t\\u0000e\\u0000\ufffd\\u0000:\\u0000 \\u0000\\r\\u0000\\n\\u0000 \\u0000I\\u0000m\\u0000p\\u0000o\\u0000s\\u0000s\\u0000i\\u0000b\\u0000l\\u0000e\\u0000 \\u0000d\\u0000e\\u0000 \\u0000c\\u0000h\\u0000a\\u0000r\\u0000g\\u0000e\\u0000r\\u0000 \\u0000l\\u0000e\\u0000 \\u0000f\\u0000i\\u0000c\\u0000h\\u0000i\\u0000e\\u0000r\\u0000 \\u0000o\\u0000u\\u0000 \\u0000l\\u0000'\\u0000a\\u0000s\\u0000s\\u0000e\\u0000m\\u0000b\\u0000l\\u0000y\\u0000 \\u0000'\\u0000S\\u0000y\\u0000s\\u0000t\\u0000e\\u0000m\\u0000.\\u0000M\\u0000a\\u0000n\\u0000a\\u0000g\\u0000e\\u0000m\\u0000e\\u0000n\\u0000t\\u0000.\\u0000A\\u0000u\\u0000t\\u0000o\\u0000m\\u0000a\\u0000t\\u0000i\\u0000o\\u0000n\\u0000,\\u0000 \\u0000V\\u0000e\\u0000r\\u0000s\\u0000i\\u0000o\\u0000n\\u0000=\\u00003\\u0000.\\u00000\\u0000.\\u00000\\u0000.\\u00000\\u0000,\\u0000 \\u0000C\\u0000u\\u0000l\\u0000t\\u0000u\\u0000r\\u0000e\\u0000=\\u0000n\\u0000e\\u0000u\\u0000t\\u0000r\\u0000a\\u0000l\\u0000,\\u0000 \\u0000P\\u0000u\\u0000b\\u0000l\\u0000i\\u0000c\\u0000K\\u0000e\\u0000y\\u0000T\\u0000o\\u0000k\\u0000e\\u0000n\\u0000=\\u00003\\u00001\\u0000b\\u0000f\\u00003\\u00008\\u00005\\u00006\\u0000a\\u0000d\\u00003\\u00006\\u00004\\u0000e\\u00003\\u00005\\u0000'\\u0000 \\u0000o\\u0000u\\u0000 \\u0000u\\u0000n\\u0000e\\u0000 \\u0000d\\u0000e\\u0000 \\u0000s\\u0000e\\u0000s\\u0000 \\u0000d\\u0000\ufffd\\u0000p\\u0000e\\u0000n\\u0000d\\u0000a\\u0000n\\u0000c\\u0000e\\u0000s\\u0000.\\u0000 \\u0000L\\u0000e\\u0000 \\u0000f\\u0000i\\u0000c\\u0000h\\u0000i\\u0000e\\u0000r\\u0000 \\u0000d\\u0000e\\u0000 \\u0000p\\u0000a\\u0000g\\u0000i\\u0000n\\u0000a\\u0000t\\u0000i\\u0000o\\u0000n\\u0000 \\u0000e\\u0000s\\u0000t\\u0000 \\u0000i\\u0000n\\u0000s\\u0000u\\u0000f\\u0000f\\u0000i\\u0000s\\u0000a\\u0000n\\u0000t\\u0000 \\u0000p\\u0000o\\u0000u\\u0000r\\u0000 \\u0000t\\u0000e\\u0000r\\u0000m\\u0000i\\u0000n\\u0000e\\u0000r\\u0000 \\u0000c\\u0000e\\u0000t\\u0000t\\u0000e\\u0000 \\u0000o\\u0000p\\u0000\ufffd\\u0000r\\u0000a\\u0000t\\u0000i\\u0000o\\u0000n\\u0000.\\u0000 \\u0000(\\u0000E\\u0000x\\u0000c\\u0000e\\u0000p\\u0000t\\u0000i\\u0000o\\u0000n\\u0000 \\u0000d\\u0000e\\u0000 \\u0000H\\u0000R\\u0000E\\u0000S\\u0000U\\u0000L\\u0000T\\u0000 \\u0000:\\u0000 \\u00000\\u0000x\\u00008\\u00000\\u00000\\u00007\\u00000\\u00005\\u0000A\\u0000F\\u0000)\\u0000\\r\\u0000\\n\\u0000\"}\u001b[0m",
        "verbosity": 0,
        "start_line": 207,
        "end_line": 208,
        "created": "2020-07-07T09:30:53.679Z",
        "job": 1440,
        "host_name": "hostname.com",
        "parent_uuid": "0242ac12-0006-80ff-50f5-0000000008bb",
        "event_display": "Host Failed",
        "cluster_host_id": "awx",
        "tower_uuid": null
    }
}

I only changed hostname value for confidentiality reason.

I don't see JSON formatting error...

Edited because of mistake on my side

ryanpetrello commented 4 years ago

Is it possible that AWX exposes a bad formatted json to rsyslog ?

I wouldn't think so - we're not configuring the template as JSON, so I'm not sure why rsyslogd would be attempting to interpret it as such:

template(name="awx" type="string" string="%rawmsg-after-pri%")

IshwarKanse commented 3 years ago

One of the customer is also running into missing job events issue with that invalid year messages in the awx-rsyslog.log file. Its using omfwd though instead of omhttp.

9431.199455280:main Q:Reg/w0  : main Q: queue.c: dequeued 0 consumable elements, szlog 0 sz phys 0
9431.199460781:main Q:Reg/w0  : queue.c: regular consumer finished, iret=4, szlog 0 sz phys 0
9431.199465381:main Q:Reg/w0  : wti.c: main Q:Reg/w0: worker IDLE, waiting for work.
9431.427717136:imuxsock.c     : imuxsock.c: Message from UNIX socket: #4, size 340
9431.427732037:imuxsock.c     : parser.c: dropped LF at very end of message (DropTrailingLF is set)
9431.427738637:imuxsock.c     : datetime.c: ParseTIMESTAMP3339: invalid year: 0, pszTS: '"'
9431.427758939:imuxsock.c     : strm 0x557645c69740: stream.c: file 6 strmFlush
9431.427764539:imuxsock.c     : strm 0x557645c69740: stream.c: strmFlushinternal: file 6(/var/lib/awx/awx-external-logger-backlog.00000001) flush, buflen 724
9431.427769239:imuxsock.c     : strm 0x557645c69740: stream.c: file 6(/var/lib/awx/awx-external-logger-backlog.00000001) doWriteInternal: bFlush 1
9431.427774140:imuxsock.c     : stream.c: strmPhysWrite, stream 0x557645c69740, len 724
9431.427790541:imuxsock.c     : strm 0x557645c69740: stream.c: file 6 write wrote 724 bytes
9431.427796641:imuxsock.c     : main Q: queue.c: write wrote 724 octets to disk, queue disk size now 1392 octets, EnqOnly:0
9431.427808042:imuxsock.c     : main Q: queue.c: EnqueueMsg advised worker start
9431.427813342:imuxsock.c     : imuxsock.c: --------imuxsock calling poll() on 2 fds

The protocol is TCP and the remote logging aggregator is Logpoint

action(type="omfwd" target="x.x.x.x" port="514" protocol="tcp" action.resumeRetryCount="-1" action.resumeInterval="20" template="awx")

With tcp protocol and omfwd, it doesn't generate a rsyslog.err log file.

icdatev commented 2 years ago

Same issue over here. Setting action.resumeRetryCount=10 doesn't change anything. rsyslogd stops sending events/logs to splunk. Only a restart of rsyslogd helps. Any further workarounds?

icdatev commented 2 years ago

@ryanpetrello Seems to be a problem because of the escaped postdata. If we unescape the postdata everything is fine. No 400 error. Btw.: This error comes out of Splunk. Is it possible, that not every postdata is escaped? We could not figure this out because even with $DebugLevel 2 the postdata is not being logged out. We get this error (because of the escaped postdata) when we do some ldap searches with community.general.ldap_search.

icdatev commented 2 years ago

Is it possible to modify the rsyslog-template (template(name="awx" type="string" string="%rawmsg-after-pri%")), so the double escape disappears again before the event gets send to splunk?

Splunk won't accept an escaped event!!!\ It is no problem if just the message is escaped. But it is a problem when the whole postdata is escaped!

icdatev commented 2 years ago

Here are some news. We have found out, that the problem with Splunk is because of the (unknown? or user defined character) unicode charcters that start with \udc.

When we send i.e. just one \udc00 we geht {"text":"Invalid data format","code":6,"invalid-event-number":0} from Splunk.

rbertol commented 1 year ago

Hey @icdatev, are you able to get a workaround for this issue? If yes, can you please share it with us?

icdatev commented 1 year ago

@rbertol Our "workaround" is a automation, which recognizes the error and restarts the rsyslogd. That works good, the need of this is of course unsatisfying. The real problem is how rsyslogd reacts to 4xx errors. See also https://github.com/rsyslog/rsyslog/issues/4348#issuecomment-1011456060. On Red Hat 8 it is now fixed. Don't know about other distros.

Crimrose commented 11 months ago

I still got this issue today with operator version 2.7.2. Do we have any ways to solve it? Updated:

    - setting: LOG_AGGREGATOR_TCP_TIMEOUT
      value: 10

I modified this config and it works!