Closed oguzy closed 6 years ago
A couple of things that might be helpful to try here:
poke, can you provide any more information here?
There is no HTTP but HTTPs, TCP and UDP available for splunk. Tried HTTPs and it didint worked either. Never tried restarting the docker image.
Tried restarting the image also. Still getting errors when i click the test.
What kind of information can i share?
The test endpoint is supposed to be returning a fairly verbose message about what went wrong with the request (the request the server makes to your logging aggregator). If you haven't seen this, it could be that the UI just isn't passing it through. If you could either use the API browser to submit the data for the test, or if you could test in the UI and gather the (redacted, of course) request and response data, that should tell us either what configuration is causing trouble, or what message the server is responding.
For Splunk, we always like to start with independent verification of the event receiver, from their docs:
http://docs.splunk.com/Documentation/SplunkCloud/6.6.3/Data/UsetheHTTPEventCollector
curl -k https://hec.example.com:8088/services/collector/event -H "Authorization: Splunk B5A79AAD-D822-46CC-80D1-819F80D7BFB0" -d '{"event": "hello world"}' {"text": "Success", "code": 0}
If you have replicated this, great, that supports the expectation that AWX should be sending the message successfully with those exact values.
Additionally, if you have a look at the README.md in this folder:
https://github.com/ansible/awx/tree/devel/tools/docker-compose
You can see request data that is verified to work and completely reproducible with the container-based logstash we have in a docker-compose target.
Splunk is only slightly different. The way they do the port is also different, we should be replacing the port value in the URL with the port you give, but it's easier to not muck with that, and it'll just use the URL as-is.
{
"LOG_AGGREGATOR_HOST": "https://hec.example.com:8088/services/collector/event",
"LOG_AGGREGATOR_PORT": null,
"LOG_AGGREGATOR_TYPE": "splunk",
"LOG_AGGREGATOR_USERNAME": "",
"LOG_AGGREGATOR_PASSWORD": "B5A79AAD-D822-46CC-80D1-819F80D7BFB0",
"LOG_AGGREGATOR_LOGGERS": [
"awx",
"activity_stream",
"job_events",
"system_tracking"
],
"LOG_AGGREGATOR_INDIVIDUAL_FACTS": false,
"LOG_AGGREGATOR_ENABLED": true,
"LOG_AGGREGATOR_LEVEL": "INFO"
}
(URL and token is from Splunk example, not meant to be used)
In addition to this, you might want to try toggling protocol, verify cert parameters. But once you confirm the endpoint is accepting messages with your token, and then see the test response, you should know what type of error the server is responding with.
# curl -k http://10.35.75.26:8088/services/collector/event -H "Authorization: Splunk 759D1202-BD2A-4F09-B74A-0328B32D66FE" -d '{"event": "hello world"}'
{"text":"Success","code":0}
I found the problem:
Changing logging aggregator part to http://10.35.75.26:8088/services/collector/event worked. Before i was just writing the ip address. Test successful now. But still now able to see the template run outputs at the splunk, though i see the test output.
I am choosing job template log level as normal and log aggregator level as info. Tried different options also. None of them helped me see the output log at the splunk.
Can it be because of my project's ansible.cfg has the below lines?
[defaults]
host_key_checking = False
stdout_callback = json
No, I don't think so. When AWX runs a job, it will set environment variables that should override those anyway.
Given that you have the logging test working, then I would investigate the logs during the container startup (to see if something obviously failed applying the logging configuration), and the logs during an action that is expected to be emitting a log. If both of these are silent, while you have the config that should work saved to the database, then that makes this tricky, and there could be something specific to this particular install method that I haven't thought of.
I updated the git repo and reinstall again. Without changing connection methot from https but just adding the url address now i can see awx activiy logs.
But still now able to see the job template logs. Is this normal? Should i usee some callback to send job logs to splunk?
Yes, the configuration you posted was obviously enabling the above loggers in addition to the job_events. These are emitted as you run a job on a task-by-task basis, containing task-related data. Actually, if that's the configuration you used, the DEBUG messages from celery beat shouldn't be showing up either. Something is likely wrong related to the container setup combined with the logging configuration. I don't have any good ideas right now, because both beat (which you are getting messages from) and the callback receiver are running under supervisor in the same container.
We may be setting up something incorrectly for base logging here? https://github.com/ansible/awx/blob/devel/installer/image_build/files/settings.py
^ I think that LOGGING['loggers']['awx']['handlers'] = ['console']
should lead to a conflict. For the awx logger, we leave in a NullHandler specifically for the purpose of swapping out for the external logger, depending on what the DB settings are on startup. Since the awx logger only has the console handler there, something probably won't work. That should cause the beat messages and other task system logs to be dropped, but shouldn't affect the specifically enumerated analytics loggers, like job_events. That's the opposite of what was observed.
Also experiencing this with Logstash, there are no logs transfered. Only testmessages get send correctly. @matburt i also suspect some initial logging settings because the AWX container logged to stdout and not to the files shown in the tower documentation.
Best regards Flo
So my settings are following on a stock Container setup straight from Docker Hub:
"LOG_AGGREGATOR_HOST": "172.17.0.1",
"LOG_AGGREGATOR_PORT": 5000,
"LOG_AGGREGATOR_TYPE": "logstash",
"LOG_AGGREGATOR_USERNAME": "",
"LOG_AGGREGATOR_PASSWORD": "",
"LOG_AGGREGATOR_LOGGERS": [
"awx",
"activity_stream",
"job_events",
"system_tracking"
],
"LOG_AGGREGATOR_INDIVIDUAL_FACTS": true,
"LOG_AGGREGATOR_ENABLED": true,
"LOG_AGGREGATOR_TOWER_UUID": "",
"LOG_AGGREGATOR_PROTOCOL": "tcp",
"LOG_AGGREGATOR_TCP_TIMEOUT": 5,
"LOG_AGGREGATOR_VERIFY_CERT": true,
"LOG_AGGREGATOR_LEVEL": "INFO",
And i can confirm the Logs only show up via the Docker Logs e.g docker logs -f awx_task
.
Furthermore when using the docker-latest of RHEL, all logs get smashed together in /var/log/messages so i need external logging : ).
AFAIK the external logging system seems not to be functioning at all.
@AlanCoding do you have a recommendation how to Debug this issue propperly ?
I already tried to modify the /etc/tower/settings.py on both containers. But after restarting them i could not abserve any change in the Logging behaviour.
The thing that I would be hoping for when debugging this is for you to hit this log message:
In the event of connection errors, it will log (just once) to the local log files the traceback related to the connection error. In the absence of seeing that, there's little that I can speculate about. The only avenue is to start replicating from the ground-up with the open-source installer, tracing the code that applies the logger according to the database settings, which is pretty non-trivial.
@AlanCoding I think the Logger works fine. Didnt see that exception up on Container creation nor later in the usage of AWX. In my Opinion, the Code for external Logging works because
gives a valid entry in Logstash. So i suspect the configuration of the complete Log Setup to be messed up, not the actual logging mechanism.
So how about these two files ?
and
Do the containers accentently use the development.py
settings instead of the production.py
values ?
I think the local settings is awx/installer/image_build/files/settings.py put there by the Dockerfile. In the open source installer, I think that it should still use production settings, so if I'm correct, the final settings are combination of defaults.py
, production.py
and settings.py
. The final settings.py
get the final say in the logging settings. These will interact with the dynamic logger setup here (using the values from the database which you save in the UI / API browser):
But nothing is jumping out at me right now. Even though the handler is set to console
, it should append the new handler for the external logger to that logger. This might inconsistently happen between celery and the web server containers, I really don't know.
I think I am experiencing the same/similar issue. I can see test log messages in Splunk and also messages that look like they are coming from the main awx service_logs but I don't think I'm seeing anything from activity_stream, job_events or system_tracking.
Not sure if it's relevant but when I click "REVERT" on the "LOGGERS SENDING DATA TO LOG AGGREGATOR FORM" field it removes all the spaces. Then if I click save it just hangs with the "Working" cog spinning.
I am running version AWX 1.0.1.153 and Ansible 2.4.1.0
Any ideas? Thanks
^ I could not replicate the UI thing on that specific "revert" button, but clicking around other places I got it to hang.
I saw this in the JS logs:
angular.js:14794 TypeError: input.replace is not a function
at Object.listToArray (configurationUtils.service.js:31)
at configuration.controller.js:414
at arrayEach (index.js:1361)
at Function.<anonymous> (index.js:3405)
at getFormPayload (configuration.controller.js:390)
at exports.default.formSave (configuration.controller.js:442)
at fn (eval at compile (angular.js:15645), <anonymous>:4:164)
at callback (angular.js:27451)
at Scope.$eval (angular.js:18521)
at Scope.$apply (angular.js:18620)
This should be a new UI issue. Are these logs the same as what you can see?
Sorry, I'm fairly new to AWX. Where would I find those logs.
I have also noticed that I am getting debug messages through to Splunk even though AWX is only configured for INFO messages.
I had changed the level to DEBUG as part of my testing but I have changed it back through the GUI.
{ [-]
@timestamp: 2017-12-12T19:22:21.687Z
cluster_host_id: awx
host: awx
level: DEBUG
logger_name: awx.main.tasks
message: Cluster node heartbeat task.
type: splunk
}
The JavaScript logs like the one I pasted can be recorded in Chrome by right clicking -> inspect -> Console
Here's what I see...
Failed to load resource: the server responded with a status of 400 (BAD REQUEST)
vendor.4c1b9c38e5c8ab72ad49.js:436 TypeError: input.replace is not a function
at Object.listToArray (app.b6a5e3c7df10a96d5ccf.js:403)
at app.b6a5e3c7df10a96d5ccf.js:403
at arrayEach (app.b6a5e3c7df10a96d5ccf.js:29)
at Function.<anonymous> (app.b6a5e3c7df10a96d5ccf.js:29)
at getFormPayload (app.b6a5e3c7df10a96d5ccf.js:403)
at exports.default.formSave (app.b6a5e3c7df10a96d5ccf.js:403)
at fn (eval at compile (vendor.4c1b9c38e5c8ab72ad49.js:436), <anonymous>:4:164)
at callback (vendor.4c1b9c38e5c8ab72ad49.js:436)
at Scope.$eval (vendor.4c1b9c38e5c8ab72ad49.js:436)
at Scope.$apply (vendor.4c1b9c38e5c8ab72ad49.js:436)
(anonymous) @ vendor.4c1b9c38e5c8ab72ad49.js:436
Mine is working with luck.
On Tue, Dec 12, 2017 at 11:51 PM, mattelphick notifications@github.com wrote:
Here's what I see...
Failed to load resource: the server responded with a status of 400 (BAD REQUEST) vendor.4c1b9c38e5c8ab72ad49.js:436 TypeError: input.replace is not a function at Object.listToArray (app.b6a5e3c7df10a96d5ccf.js:403) at app.b6a5e3c7df10a96d5ccf.js:403 at arrayEach (app.b6a5e3c7df10a96d5ccf.js:29) at Function.
(app.b6a5e3c7df10a96d5ccf.js:29) at getFormPayload (app.b6a5e3c7df10a96d5ccf.js:403) at exports.default.formSave (app.b6a5e3c7df10a96d5ccf.js:403) at fn (eval at compile (vendor.4c1b9c38e5c8ab72ad49.js:436), :4:164) at callback (vendor.4c1b9c38e5c8ab72ad49.js:436) at Scope.$eval (vendor.4c1b9c38e5c8ab72ad49.js:436) at Scope.$apply (vendor.4c1b9c38e5c8ab72ad49.js:436) (anonymous) @ vendor.4c1b9c38e5c8ab72ad49.js:436 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ansible/awx/issues/335#issuecomment-351190160, or mute the thread https://github.com/notifications/unsubscribe-auth/AAF89BTeMJnKPzuT2tqQCixSAcRxAFkUks5s_udMgaJpZM4PqcFU .
-- Oğuz Yarımtepe http://about.me/oguzy
Does anyone have any further insight into what the issue might be? Is there an underlying config file that I can check to see what's happening? Thanks.
Hi Together,
Not sure if it's relevant but when I click "REVERT" on the "LOGGERS SENDING DATA TO LOG AGGREGATOR FORM" field it removes all the spaces. Then if I click save it just hangs with the "Working" cog spinning.
I also experience this. AWX 1.0.1.240
When clicking revert:
When clicking revert it requests:
https://awx.xxxx.com/api/v2/settings/all/
and in the respose there are the aggregators
"LOG_AGGREGATOR_LOGGERS":["awx","activity_stream","job_events","system_tracking"]
When Saving there is this JS error. And the Page hangs with a loading wheel.
vendor.de7dd9b7b75a5ad9b8e5.js:436 TypeError: input.replace is not a function
at Object.listToArray (app.26612aa730614b7b6215.js:403)
at app.26612aa730614b7b6215.js:403
at arrayEach (app.26612aa730614b7b6215.js:29)
at Function.<anonymous> (app.26612aa730614b7b6215.js:29)
at getFormPayload (app.26612aa730614b7b6215.js:403)
at exports.default.formSave (app.26612aa730614b7b6215.js:403)
at fn (eval at compile (vendor.de7dd9b7b75a5ad9b8e5.js:436), <anonymous>:4:164)
at callback (vendor.de7dd9b7b75a5ad9b8e5.js:436)
at Scope.$eval (vendor.de7dd9b7b75a5ad9b8e5.js:436)
at Scope.$apply (vendor.de7dd9b7b75a5ad9b8e5.js:436)
(anonymous) @ vendor.de7dd9b7b75a5ad9b8e5.js:436
(anonymous) @ vendor.de7dd9b7b75a5ad9b8e5.js:436
$apply @ vendor.de7dd9b7b75a5ad9b8e5.js:436
(anonymous) @ vendor.de7dd9b7b75a5ad9b8e5.js:436
dispatch @ vendor.de7dd9b7b75a5ad9b8e5.js:29
elemData.handle @ vendor.de7dd9b7b75a5ad9b8e5.js:29
But i don´t suspect this to be the actual logging problem where the external logging mechanism is not functional. The Streams are correctly set as you can see from https://awx.xxxx.com/api/v2/settings/all/
. This issue is just a UI Problem. Maybe it parses the Json response wrong and inserts it without spaces.
For a Productive environment a functional logging mechanism is a important feature because with a central automation system one have to comply to company regulations which force actions to be traceable to each administrator/person. So it is important for us to get this working and i am willing to invest some time in this issue.
I think I am experiencing the same/similar issue. I can see test log messages in Splunk and also messages that look like they are coming from the main awx service_logs but I don't think I'm seeing anything from activity_stream, job_events or system_tracking.
what you described here is also the case with an ELK stack as external logger. And thats the real point of this issue. As far as external logging goes it seems to not function regardless of the external logging software. So @AlanCoding how could we advance this issue. We need some info how logging works in the docker environment or should work. What can we do to debug this, as it seems to be easy to reproduce?
Best regards from Germany Flo
Quite frankly, my external logger now seems to work. I did an update to 1.0.2 , still the external logger was not working and only printing my Test calls. Than i flipped the External Logger Switch:
ON->OFF->ON
Strange 😃 , @oguzy & @mattelphick can you confirm this for Splunk ?
I am still on 1.0.1.153. I toggled the External Logger Switch and clicked save. However I don't think I am seeing the logs for "job_events" still.
That said, having never seen what successful logging looks like maybe I am interpreting the output incorrectly.
I do see a message that looks like this...
...but I am not sure which logger generates this message.
What I am hoping to see from the "job_events" logger is more detail like this...
Am I misinterpreting things? What sort of detail should I see from the "job_events" logger?
Thanks
Can anyone help with the question above? Thanks
We have upgraded to version 1.0.2.303. However I am still not clear on whether the loggers are sending data externally.
This is the list of logger_names that I see. How do the logger_names in the log output relate to the four loggers in the configuration section (awx, activity_stream, job_events, system_tracking)?
Hi @mattelphick
I can look it up at the company on Monday but as far as i remember i have simmilar logs in ELK. Mainly it was important for us to log changes by users on templates, projects etc. which gets correctly written to the external logger aswell as what the specific user changed. I dont know if the actual output of a playbook gets logged. You can search in your logs for User
, there should be log entrys for each changed object, what was changed and by whom. I can only confirm that i get those logs for sure. But as far as i can see the logging itself seems to work. You get more than just test messages in splunk.
I also dont know how the log chanels in the settings relate to the log tags nor does the documentation of ansible tower tell some more infos about these other than:
Loggers to Send Data to the Log Aggregator Form: All four types of data are pre-populated by default. Click the tooltip help icon next to the field for additional information on each data type. Delete the data types you do not want.
Best Regards Flo
No comment in 3 months, closing. Please provide more detail if you have it.
ISSUE TYPE
COMPONENT NAME
SUMMARY
After setting the splunk logging at the AWX UI, i am not able to see any logs at the splunk interface.
ENVIRONMENT
STEPS TO REPRODUCE
Below is my settings:
EXPECTED RESULTS
To see some output at the splunk search:
index=awx
ACTUAL RESULTS
No results
ADDITIONAL INFORMATION
Token is created for submitting logs. inventory outputs is as below: