mysql_query task hangs indefinitely seems to be waiting on response from db

anthonyxpalermo commented 4 years ago

SUMMARY

ISSUE TYPE

Bug Report

COMPONENT NAME

mysql_query

ANSIBLE VERSION

ansible 2.10.1
  config file = /Users/palea009/gitlab/ansible-cda-tools/ansible.cfg
  configured module search path = ['/Users/palea009/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /Users/palea009/.pyenv/versions/3.8.0/lib/python3.8/site-packages/ansible
  executable location = /Users/palea009/.pyenv/versions/3.8.0/bin/ansible
  python version = 3.8.0 (default, Oct  1 2020, 09:52:48) [Clang 10.0.1 (clang-1001.0.46.4)]

CONFIGURATION

CACHE_PLUGIN(/Users/palea009/gitlab/ansible-cda-tools/ansible.cfg) = jsonfile
CACHE_PLUGIN_CONNECTION(/Users/palea009/gitlab/ansible-cda-tools/ansible.cfg) = /tmp/facts_
CACHE_PLUGIN_TIMEOUT(/Users/palea009/gitlab/ansible-cda-tools/ansible.cfg) = 7200
COLLECTIONS_PATHS(/Users/palea009/gitlab/ansible-cda-tools/ansible.cfg) = ['/Users/palea009
DEFAULT_DEBUG(/Users/palea009/gitlab/ansible-cda-tools/ansible.cfg) = False
DEFAULT_GATHERING(/Users/palea009/gitlab/ansible-cda-tools/ansible.cfg) = smart
DEFAULT_HOST_LIST(/Users/palea009/gitlab/ansible-cda-tools/ansible.cfg) = ['/Users/palea009
DEFAULT_LOG_PATH(/Users/palea009/gitlab/ansible-cda-tools/ansible.cfg) = /Users/palea009/gi
DEFAULT_ROLES_PATH(/Users/palea009/gitlab/ansible-cda-tools/ansible.cfg) = ['/Users/palea00
HOST_KEY_CHECKING(/Users/palea009/gitlab/ansible-cda-tools/ansible.cfg) = False

OS / ENVIRONMENT

MacOS local connecton

STEPS TO REPRODUCE

- name: "Insert {{hostname_item}} and {{ports_item}} Records to our DataBase"

  mysql_query:
      login_db: "{{db_prod}}"
      login_port: "{{db_port_prod }}"
      login_host: "{{db_host_prod }}"
      login_user: "{{db_user_prod }}"
      login_password: "{{ db_password_prod }}"
      # connect_timeout: 5
      query:
        - INSERT INTO etl_in_aws_acm_certs_last_mile ( {{ acm_to_db.keys() | join(', ') }} ) VALUES ( {{ acm_to_db.values() | map('regex_replace', '^(.*)$', "'\1'") | join(', ') }} )
      single_transaction: yes
  ignore_unreachable: true
  ignore_errors: true

EXPECTED RESULTS

I expect that the task completes and moves to the next task

ACTUAL RESULTS

Task hangs never moves on even if the mysql_query option connect_timeout: 5 is used.

2020-10-01 14:47:26,579 p=58280 u=root n=ansible |  58280 1601578046.57981: ANSIBALLZ: using cached module: /Users/palea009/.ansible/tmp/ansible-local-57233d74erkjy/ansiballz_cache/mysql_query-ZIP_DEFLATED
2020-10-01 14:47:26,581 p=58280 u=root n=ansible | Using module file /Users/palea009/gitlab/ansible-cda-tools/collections/ansible_collections/community/general/plugins/modules/mysql_query.py
2020-10-01 14:47:26,582 p=58280 u=root n=ansible |  58280 1601578046.58222: transferring module to remote /var/root/.ansible/tmp/ansible-tmp-1601578046.5334249-110659444801429/AnsiballZ_mysql_query.py
2020-10-01 14:47:26,584 p=58280 u=root n=ansible | <localhost> PUT /Users/palea009/.ansible/tmp/ansible-local-57233d74erkjy/tmpjxj3q3fb TO /var/root/.ansible/tmp/ansible-tmp-1601578046.5334249-110659444801429/AnsiballZ_mysql_query.py
2020-10-01 14:47:26,603 p=58280 u=root n=ansible |  58280 1601578046.60328: done transferring module to remote
2020-10-01 14:47:26,604 p=58280 u=root n=ansible |  58280 1601578046.60415: _low_level_execute_command(): starting
2020-10-01 14:47:26,604 p=58280 u=root n=ansible |  58280 1601578046.60471: _low_level_execute_command(): executing: /bin/sh -c 'chmod u+x /var/root/.ansible/tmp/ansible-tmp-1601578046.5334249-110659444801429/ /var/root/.ansible/tmp/ansible-tmp-1601578046.5334249-110659444801429/AnsiballZ_mysql_query.py && sleep 0'
2020-10-01 14:47:26,605 p=58280 u=root n=ansible |  58280 1601578046.60523: in local.exec_command()
2020-10-01 14:47:26,605 p=58280 u=root n=ansible | <localhost> EXEC /bin/sh -c 'chmod u+x /var/root/.ansible/tmp/ansible-tmp-1601578046.5334249-110659444801429/ /var/root/.ansible/tmp/ansible-tmp-1601578046.5334249-110659444801429/AnsiballZ_mysql_query.py && sleep 0'
2020-10-01 14:47:26,606 p=58280 u=root n=ansible |  58280 1601578046.60625: opening command with Popen()
2020-10-01 14:47:26,611 p=58280 u=root n=ansible |  58280 1601578046.61101: done running command with Popen()
2020-10-01 14:47:26,612 p=58280 u=root n=ansible |  58280 1601578046.61246: getting output with communicate()
2020-10-01 14:47:26,623 p=58280 u=root n=ansible |  58280 1601578046.62298: done communicating
2020-10-01 14:47:26,623 p=58280 u=root n=ansible |  58280 1601578046.62363: done with local.exec_command()
2020-10-01 14:47:26,624 p=58280 u=root n=ansible |  58280 1601578046.62419: _low_level_execute_command() done: rc=0, stdout=, stderr=
2020-10-01 14:47:26,624 p=58280 u=root n=ansible |  58280 1601578046.62471: _low_level_execute_command(): starting
2020-10-01 14:47:26,625 p=58280 u=root n=ansible |  58280 1601578046.62541: _low_level_execute_command(): executing: /bin/sh -c '/usr/bin/python /var/root/.ansible/tmp/ansible-tmp-1601578046.5334249-110659444801429/AnsiballZ_mysql_query.py && sleep 0'
2020-10-01 14:47:26,626 p=58280 u=root n=ansible |  58280 1601578046.62598: in local.exec_command()
2020-10-01 14:47:26,626 p=58280 u=root n=ansible | <localhost> EXEC /bin/sh -c '/usr/bin/python /var/root/.ansible/tmp/ansible-tmp-1601578046.5334249-110659444801429/AnsiballZ_mysql_query.py && sleep 0'
2020-10-01 14:47:26,627 p=58280 u=root n=ansible |  58280 1601578046.62702: opening command with Popen()
2020-10-01 14:47:26,631 p=58280 u=root n=ansible |  58280 1601578046.63149: done running command with Popen()
2020-10-01 14:47:26,632 p=58280 u=root n=ansible |  58280 1601578046.63273: getting output with communicate()

Just stays there and does not move forward ....

Andersson007 commented 4 years ago

@anthonyxpalermo hey, thanks for reporting this! A couple of qustions: 1) Have you tried to use another distributions? Is it difficult for you to try this against a Linux distribution (because i can reproduce it using our CI container based on Linux) 2) If you run another query, e.g. "SELECT 1" or anything else, the task hangs? 3) When the task's hanging, could you please, enter your database in another console and run "SHOW processlist;" and provide the output? 4) Try to remove ignore_unreachable and ignore_errors and run your initial playbook. If it fails, please, provide the output with -vvv command.

Waiting for your feedback! Thanks!

anthonyxpalermo commented 4 years ago

A couple of qustions:

Have you tried to use another distributions? Is it difficult for you to try this against a Linux distribution (because i can reproduce it using our CI container based on Linux)

I am not able at the moment to test this in a different Distro

If you run another query, e.g. "SELECT 1" or anything else, the task hangs? The Tasks does not always hang , out of 4000 entries it only hangs 3 times. The issue is that when it does hang it never moves forward unless A) Quit the playbook CTRL-C B) Use Ansible 2.10 with task timeout option .

When the task's hanging, could you please, enter your database in another console and run "SHOW processlist;" and provide the output?

I did so , it does not show any connection in the process list to the IP or DNS of the ansible request

Try to remove ignore_unreachable and ignore_errors and run your initial playbook. If it fails, please, provide the output with -vvv command.

Tried that, no output different than the one provided. Just hangs. and vvv does not report anything in each case except for

2020-10-01 14:47:26,626 p=58280 u=root n=ansible | EXEC /bin/sh -c '/usr/bin/python /var/root/.ansible/tmp/ansible-tmp-1601578046.5334249-110659444801429/AnsiballZ_mysql_query.py && sleep 0'

a debug=True and -vvvv shows always the following

2020-10-01 14:47:26,627 p=58280 u=root n=ansible | 58280 1601578046.62702: opening command with Popen() 2020-10-01 14:47:26,631 p=58280 u=root n=ansible | 58280 1601578046.63149: done running command with Popen() 2020-10-01 14:47:26,632 p=58280 u=root n=ansible | 58280 1601578046.63273: getting output with communicate()

after that nothing.

Jorge-Rodriguez commented 4 years ago

Could it be OOM on the invoked subprocess? If that's the case the call to communicate would forever wait for a response that won't arrive. It's not the only possible cause, but it's one that won't show on the ansible logs.

Jorge-Rodriguez commented 4 years ago

@anthonyxpalermo Another question to help pinpointing this issue. Can you confirm that you're actually using this collection and not the Ansible bundled module? In your sample code the task module appears as mysql_query rather than community.mysql.mysql_query. Thanks.

Andersson007 commented 4 years ago

@anthonyxpalermo thanks for fast reply! In addition to @Jorge-Rodriguez 's question, could you also please: 1) provide rows of ps aux related to ansible processes on the target machine when a task hanging 2) do grep -i "fatal\|error\|warning\|panic\|something else" ... of mysql and system logs after hanging (or share the logs if it's ok for you, i could parse them myself)

anthonyxpalermo commented 4 years ago

@anthonyxpalermo thanks for fast reply! In addition to @Jorge-Rodriguez 's question, could you also please:

provide rows of ps aux related to ansible processes on the target machine when a task hanging 68830 palea009 22 0 9537M 104M ? 0.0 0.3 0:02.17 │ │ │ └─ /Applications/Visual Studio Code.app/Contents/Frameworks/Code Helper (Renderer).app/Contents/MacOS/Code Helper (Renderer) /Users/palea009/.vscode/extensions/vscoss.vscode-ansible-0.5.2/node_modules/vscode-languageclient/lib/utils/el 69173 palea009 24 0 5046M 55544 ? 13.0 0.2 0:12.84 │ │ └─ /Users/palea009/.pyenv/versions/3.8.0/bin/python3.8 /Users/palea009/.pyenv/versions/3.8.0/bin/ansible-playbook PlayBooks/AWS/roles/aws_acm_last_mile_scanner.yml -i inventory/certdiscoverytools --vault-id vault-ids/.prod.vaultpass -e 69304 palea009 16 0 5048M 43140 ? 0.0 0.1 0:00.27 │ │ ├─ /Users/palea009/.pyenv/versions/3.8.0/bin/python3.8 /Users/palea009/.pyenv/versions/3.8.0/bin/ansible-playbook PlayBooks/AWS/roles/aws_acm_last_mile_scanner.yml -i inventory/certdiscoverytools --vault-id vault-ids/.prod.vaultpass 69321 palea009 32 0 4296M 1192 ? 0.0 0.0 0:00.00 │ │ │ └─ /bin/sh -c /usr/bin/python /Users/palea009/.ansible/tmp/ansible-tmp-1601906737.595197-69304-248465093681409/AnsiballZ_aws_acm_info.py && sleep 0 69322 palea009 24 0 5098M 61340 ? 2.9 0.2 0:03.61 │ │ │ └─ /usr/bin/python /Users/palea009/.ansible/tmp/ansible-tmp-1601906737.595197-69304-248465093681409/AnsiballZ_aws_acm_info.py

do grep -i "fatal\|error\|warning\|panic\|something else" ... of mysql and system logs after hanging (or share the logs if it's ok for you, i could parse them myself) Mysql is in AWS RDS , CloudWatch shows no issues on the DB side, its also tru that other ansible playbooks writing to db. Either way we need somehow a way to break out of the hanging state , I agree with OOM on the invoked subprocess, break out and retry

anthonyxpalermo commented 4 years ago

@anthonyxpalermo Another question to help pinpointing this issue. Can you confirm that you're actually using this collection and not the Ansible bundled module? In your sample code the task module appears as mysql_query rather than community.mysql.mysql_query. Thanks.

made change from mysql_query to community.mysql.mysql_query will post once playbook ends

anthonyxpalermo commented 4 years ago

issues remains even with community.mysql.mysql_query

TASK [aws-acm : Insert {{hostname_item}} and {{ports_item}} Records to our DataBase] *** fatal: [localhost]: FAILED! => {"changed": false, "msg": "The community.mysql.mysql_query action failed to execute in the expected time frame (10) and was terminated"}

anthonyxpalermo commented 4 years ago

the records for these failures are not added to the DB. I did check that this does not happen to a specific record has previous run same record did not fail, so its not related to the record info being added to the DB

Andersson007 commented 4 years ago

@anthonyxpalermo could you also try to run the same query using shell module?

Andersson007 commented 4 years ago

Just interesting if this hangs again. If yes, this will clarify the picture

anthonyxpalermo commented 4 years ago

@anthonyxpalermo could you also try to run the same query using shell module?

How do I do this ?

Jorge-Rodriguez commented 4 years ago

@anthonyxpalermo could you also try to run the same query using shell module?

How do I do this ?

- shell: mysql -e "YOUR QUERY HERE"

Connection parameters notwithstanding. Essentially the command you'd run by hand in bash as a parameter to the shell module.

In fact, it would be interesting to know if the query succeeds when run by hand too.

Jorge-Rodriguez commented 4 years ago

Also, on your psaux output, I can see processes running the aws_acm_info module but none running the mysql_query module.

Jorge-Rodriguez commented 4 years ago

You may also want to use the quote filter instead of regex_replace. If this is an OOM, that might help.

anthonyxpalermo commented 4 years ago

Also, on your psaux output, I can see processes running the aws_acm_info module but none running the mysql_query module.

Not sure why in MacOS you don't see that. the module mysql query runs and after it complete the playbook cycles again

anthonyxpalermo commented 4 years ago

You may also want to use the quote filter instead of regex_replace. If this is an OOM, that might help.

do you mean here ? 'query:

INSERT INTO etl_in_aws_acm_certs_last_mile ( {{ acm_to_db.keys() | join(', ') }} ) VALUES ( {{ acm_to_db.values() | map('regex_replace', '^(.*)$', "'\1'") | join(', ') }} ) single_transaction: yes'

could you post how ? I am not familiar with how to use that option

anthonyxpalermo commented 4 years ago

@anthonyxpalermo could you also try to run the same query using shell module?

How do I do this ?
- shell: mysql -e "YOUR QUERY HERE"
Connection parameters notwithstanding. Essentially the command you'd run by hand in bash as a parameter to the shell module.

In fact, it would be interesting to know if the query succeeds when run by hand too.

It sounds like I would need to refactor the whole query instead of just dropping in because I need to account for Connection Parameters.

I can say that we use same query on smaller playbooks , lot less data to pass and we never have this issue. There are also other playbooks like mentioned before that connect and write to this DB none have issues.

Jorge-Rodriguez commented 4 years ago

In that case, I would not think the issue lies within the module itself. Maybe you can tweak the playbook concurrency, or maybe the issue lies with the concurrent connections in RDS. I've had issues with connections not being closed properly (which I could verify we're handling properly).

Ultimately, if the dataset is too big, it could be split into smaller batches, but I don't know if that would make too big of a difference, maybe if a small delay is introduced between batches.

Jorge-Rodriguez commented 4 years ago

@Andersson007 @anthonyxpalermo OK, I took a quick cursory look at the code and it seems that we're not explicitly closing the connections. And while the GC should take care of those objects upon module completion, it's not the first time I've seen the server hanging on to stale connections in these kind of scenarios. In the case of a large dataset like @anthonyxpalermo is describing, it stands to reason that the server could run out of its socket allocation and so the task stalls waiting for one of the stale sockets to time out and become available.

I can start a PR to explicitly close the connections on all modules in this collection, as it seems none of them does.

@anthonyxpalermo You might have to wait a few days for this, and I cannot guarantee that it will fix your issue.

Andersson007 commented 4 years ago

@Jorge-Rodriguez ok, will wait for the pr then (i think no changelog fragment needed while we don’t know if it helps or not, or maybe we can put an unrelated fragment, i.e. about refactoring)

anthonyxpalermo commented 4 years ago

In that case, I would not think the issue lies within the module itself. Maybe you can tweak the playbook concurrency, or maybe the issue lies with the concurrent connections in RDS. I've had issues with connections not being closed properly (which I could verify we're handling properly).

Ultimately, if the dataset is too big, it could be split into smaller batches, but I don't know if that would make too big of a difference, maybe if a small delay is introduced between batches.

Jorge, I appreciate the input but I believe this is an issue with the code of the module and how it handles connections. scaling to bigger amount of request is just exposing the issue rendering it visible

anthonyxpalermo commented 4 years ago

@Andersson007 @anthonyxpalermo OK, I took a quick cursory look at the code and it seems that we're not explicitly closing the connections. And while the GC should take care of those objects upon module completion, it's not the first time I've seen the server hanging on to stale connections in these kind of scenarios. In the case of a large dataset like @anthonyxpalermo is describing, it stands to reason that the server could run out of its socket allocation and so the task stalls waiting for one of the stale sockets to time out and become available.

I can start a PR to explicitly close the connections on all modules in this collection, as it seems none of them does.

@anthonyxpalermo You might have to wait a few days for this, and I cannot guarantee that it will fix your issue.

Thank you ! I will stand by so that we can review once the PR is completed. I would add that I tend to agree with what you are stating even because when using the modules connect_timeout, it still hangs and does not move forward.

Jorge-Rodriguez commented 4 years ago

@anthonyxpalermo Check out #44. If you can apply those changes locally and tell us whether that helps with your issue it would be greatly appreciated.

Andersson007 commented 4 years ago

@anthonyxpalermo also just read in the doc For example, a task may take longer to complete than the SSH session allows for, causing a timeout.. Could you please check SSH related setting on your target machine? https://www.tecmint.com/increase-ssh-connection-timeout/#:~:text=On%20the%20server%2C%20head%20over,%2Fssh%2Fsshd_config%20configuration%20file.&text=The%20ClientAliveInterval%20parameter%20specifies%20the,to%20keep%20the%20connection%20alive.

Andersson007 commented 4 years ago

i'm not an expert in such things but https://docs.ansible.com/ansible/devel/user_guide/playbooks_async.html#run-tasks-concurrently-poll-0 might be also helpful.

Jorge-Rodriguez commented 4 years ago

If I read @anthonyxpalermo logs correctly, his target is localhost, not sure if he's using the local connection too.

anthonyxpalermo commented 4 years ago

If I read @anthonyxpalermo logs correctly, his target is localhost, not sure if he's using the local connection too.

I have tried with connection local on and off and same results

anthonyxpalermo commented 4 years ago

@anthonyxpalermo Check out #44. If you can apply those changes locally and tell us whether that helps with your issue it would be greatly appreciated.

Running playbook with community.mysql #44

Jorge-Rodriguez commented 4 years ago

If I read @anthonyxpalermo logs correctly, his target is localhost, not sure if he's using the local connection too.

I have tried with connection local on and off and same results

This really looks like a connection listener exhaustion issue. Awaiting news from the run with #44

anthonyxpalermo commented 4 years ago

If I read @anthonyxpalermo logs correctly, his target is localhost, not sure if he's using the local connection too.

I have tried with connection local on and off and same results

This really looks like a connection listener exhaustion issue. Awaiting news from the run with #44

Sorry to say but even with #44 the issue persist

TASK [aws-acm : Insert {{hostname_item}} and {{ports_item}} Records to our DataBase] **

just waiting here I will rerun in debug mode to provide where it stops

Jorge-Rodriguez commented 4 years ago

Let me check whether the code is doing what is meant to be doing in the meanwhile

Jorge-Rodriguez commented 4 years ago

@anthonyxpalermo I can't see anything obviously wrong with the code. Can you connect to the database while it's running and post the output of

show status like "%conn%";

On my local tests I can not see any more than one single concurrent connection at any given time, with or without #44

Jorge-Rodriguez commented 4 years ago

@anthonyxpalermo one more question. You mentioned your database is in RDS, what are you using exactly, mysql, Mariadb or aurora with mysql compatibility?

anthonyxpalermo commented 4 years ago

@anthonyxpalermo one more question. You mentioned your database is in RDS, what are you using exactly, mysql, Mariadb or aurora with mysql compatibility?

MariaDB

Jorge-Rodriguez commented 4 years ago

@anthonyxpalermo one more thing that I don't understand. The task that you posted inserts one single row into the database. How exactly do you run through the 4000 entries that you mentioned?

anthonyxpalermo commented 4 years ago

@anthonyxpalermo one more thing that I don't understand. The task that you posted inserts one single row into the database. How exactly do you run through the 4000 entries that you mentioned?

One at a time . I get the info I need via a set_fact and that ends up in the query call I posted at beginning of the post.

Jorge-Rodriguez commented 4 years ago

@anthonyxpalermo could you post the playbook, redacted as needed?

Jorge-Rodriguez commented 4 years ago

On an additional note, you said you're using Ansible 2.10.1 which is not released yet. Any reasons you're using a development version?

Jorge-Rodriguez commented 4 years ago

@anthonyxpalermo I'd be curious as to how exactly you're running this. I tried to replicate your setup and run the task in a 4000 iteration loop with async and at no point I had more than 10 active threads in RDS

anthonyxpalermo commented 4 years ago

@anthonyxpalermo could you post the playbook, redacted as needed?

Hi Jorge, have not forgotten this just a busy deadline day !

anthonyxpalermo commented 4 years ago

@anthonyxpalermo I'd be curious as to how exactly you're running this. I tried to replicate your setup and run the task in a 4000 iteration loop with async and at no point I had more than 10 active threads in RDS

Althout there are 4000+ items , each index of the loop runs synchronously so for each item looped we first right to the db and then move to the. next. We dont collect all data and async write.

Jorge-Rodriguez commented 4 years ago

So you're running a sequential loop on the 4000+ items. That's essentially what I did myself and I can't reproduce. I'd suggest you check your active threads in your Mariadb instance, the process list and the slow query log. I would suspect a side interaction if there are other queries going on in the database.

With the current info I cannot pinpoint anything wrong with the module's code

anthonyxpalermo commented 4 years ago

So you're running a sequential loop on the 4000+ items. That's essentially what I did myself and I can't reproduce. I'd suggest you check your active threads in your Mariadb instance, the process list and the slow query log. I would suspect a side interaction if there are other queries going on in the database.

With the current info I cannot pinpoint anything wrong with the module's code

Jorge a couple of things We don't believe this has anything to do with MariaDB and even if it did the module needs to still be able to cut out an reply or move on. That its not to say that the issue could be MariaDB but in any case the module has to be able to cut out when there is no replies from the DB Agree ?

Jorge-Rodriguez commented 4 years ago

@anthonyxpalermo I mostly agree with you, but not necessarily in all cases. If the module does send the query and it's stalled waiting for a response that never comes it's not quite up to the module to arbitrarily give up, although there could be a task timeout option.

Now you use the connect timeout option but that's not quite the same, I think we're looking at a scenario where the connection works but we're not getting a response from the server, the reason for which is still unknown. In this case it won't be desirable for the module to give up arbitrarily since we can't make assumptions over how long a given query should take.

That being said, you can replicate a task timeout (wait for a result for up to n time units) by using the async and poll options.

Ultimately, I can't find anything substantially wrong with the module that could be causing your issue, and the fact that your error rate is <0.1% seems to suggest that the module works generally well. I feel inclined to believe that you're experiencing a situational issue with your database rather than a bug in the module. I would suggest you try the 'async' and 'poll' workaround.

Jorge-Rodriguez commented 4 years ago

@Andersson007 can you reopen this? I would like to hear from @anthonyxpalermo first.

Andersson007 commented 4 years ago

Oh, sorry, it was closed automatically by merging @Jorge-Rodriguez 's PR

anthonyxpalermo commented 4 years ago

@Jorge-Rodriguez In Ansible 2.10.x there is now the ability to add a timeout value to the task. It can be applied elsewhere but for us using in the task has helped us move the overall playbook without waiting for ever for that close response. In my opinion its a workaround to a function that module should handle on its own and it also forces other community members to rely on moving to Ansible 2.10. So far its the ONLY solution to this issue. I will be deploying this to RHEL 7.x and possibly 8 to see if there is a platform dependency but I dont have a timeline yet. Of course async and poll options need to be tested too but FYI we are not looking to move our current code base to those options.

- name: "Insert {{hostname_item}} and {{ports_item}} Records to our DataBase"

- name: "Insert registered results to our MariaDB"
  timeout: 10
  community.mysql.mysql_query:
      login_db: "{{db_prod}}"
      login_port: "{{db_port_prod }}"
      login_host: "{{db_host_prod }}"
      login_user: "{{db_user_prod }}"
      login_password: "{{ db_password_prod }}"
      # connect_timeout: 5
      query:
        - INSERT INTO etl_in_aws_acm_certs_last_mile ( {{ acm_to_db.keys() | join(', ') }} ) VALUES ( {{ acm_to_db.values() | map('regex_replace', '^(.*)$', "'\1'") | join(', ') }} )
      single_transaction: yes
  ignore_unreachable: true
  ignore_errors: true

Jorge-Rodriguez commented 4 years ago

@anthonyxpalermo the async and poll options have the same effect as the timeout option, this has been the way to deal with task timeouts up until ansible 2.10, so this issue is not forcing anybody to upgrade.

As I said before, we can't make the module arbitrarily give up because, while that would solve your problem, it will cause the converse problem for people using the module for long running queries only to have the module give up half way through. Adding a timeout option to the module makes little sense since Ansible itself has mechanisms for that.

Ultimately, needing a task to timeout because of a slow query (since we've ruled out connection issues) is a DBA problem, not an automation one.

Honestly, I don't see what improvement you'd expect to see on this module.

Jorge-Rodriguez commented 4 years ago

@Andersson007 since Ansible provides functionality for timeouts, I don't think it's this collections responsibility to replicate that. In my opinion this is not a bug and the issue can be closed.

As a best effort, we could implement read_timeout and write_timeout options.

ansible-collections / community.mysql