altlinux / gpupdate

Utility to apply GPOs from Windows Active Directory domains in UNIX environments
https://www.altlinux.org/Групповые_политики
GNU General Public License v3.0
24 stars 16 forks source link

D-Bus timeout error: Did not receive a reply #98

Closed NexonSU closed 3 years ago

NexonSU commented 4 years ago

gpupdate not working on AD domain, because of 25-second reply limit on dbus.

DEBUG:root:2020-07-14 10:14:25:Target is: All
ERROR:root:Unable to perform gpupdate for None with current permissions, will update current user settings
DEBUG:root:Starting gpupdate via D-Bus
INFO:root:2020-07-14 10:14:25:Starting GPO applier for computer via D-Bus
ERROR:root:2020-07-14 10:14:50:No reply from oddjobd gpoa runner for computer
ERROR:root:Error running GPOA for computer: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.

GPOA:

# time gpoa
Setting log level to ERROR
2020-07-14 10:39:40:adp is not installed - plugin cannot be initialized
ERROR: talloc_free with references at ../../libgpo/pygpo.c:481
        reference at ../../pytalloc_util.c:164
        ...
        reference at ../../pytalloc_util.c:164
2.69user 0.74system 1:20.26elapsed 4%CPU (0avgtext+0avgdata 79028maxresident)k
0inputs+27736outputs (0major+42196minor)pagefaults 0swaps

Similar issue: https://bugzilla.redhat.com/show_bug.cgi?id=1085491

ghost commented 4 years ago

@NexonSU , Nice report! Do you run gpupdate from user or from administrator? I think I'll try to patch oddjob in ALT in order to fix the problem. Also, could you possibly run gpoa --loglevel 0 so it would be possible to see full log with timestamps? I also need the package version because GPOA versions after 0.7 must not include Setting log level to ERROR message.

NexonSU commented 4 years ago

@NexonSU , Nice report! Do you run gpupdate from user or from administrator? I think I'll try to patch oddjob in ALT in order to fix the problem. Also, could you possibly run gpoa --loglevel 0 so it would be possible to see full log with timestamps? I also need the package version because GPOA versions after 0.7 must not include Setting log level to ERROR message.

Seems like, domain users don't have right to run gpoa. So, I used local root account. gpoa --loglevel 0: https://pastebin.com/JQGyftRh gpupdate version:

  Installed: 0.6.0-alt2:p9+241549.702.30.1@1592336683
  Candidate: 0.6.0-alt2:p9+241549.702.30.1@1592336683
  Version Table:
*** 0.6.0-alt2:p9+241549.702.30.1@1592336683 0
        100 RPM Database
ghost commented 4 years ago

I strongly suggest you to update to version 0.7. It is now made its way to p9 and sisyphus repos so you won't need any additional repos from now on. I did a large rewrite of GPT parsing functionality and fixed several critical bugs since 0.6.

Usually gpoa must be run by oddjobd with root privileges without any user interaction. It is supposed you (as a user) will use gpupdate utility to trigger group policy updates via D-Bus but at the moment it is unable to pass loglevel parameter.

As for the problem I can see the following log entries:

2020-07-22 08:44:56:GPO: GPO-WPP-Setting ({GUIDGUID-GUID-GUID-GUID-GUIDGUIDGUID})
2020-07-22 08:45:52:Re-caching Local Policy

Look at the timestamps. Policy parsing and re-caching begins right after GPO replication so it took nearly one minute to replicate policies. I think we need to investigate the case. Could you please tell me the size of the GPTs which were replicated? I want to determine if it is just slow network problem or Samba bug (because I use Samba code to trigger GPO replication).

We will try to update oddjobd to overcome the problem anyway.

P. S.: I would be glad if you tell me if it is needed to make the documentation on ALTWiki or on GitHub more user-friendly or verbose so it will be easier to use the software. Any suggestions and comments will be appreciated.

NexonSU commented 4 years ago

After updating gpupdate to 0.7, I got SQL error. Seems like, cache removal at /var/cache/gpupdate is required. Size of these 19 GPTs is 220-250KB. gpoa execution time is same with another domain controller, which located at another city, ping is 30-60ms with packet drops. By default, AltLinux host connected to local domain controller via wired gigabit connection, ping is lower than 1ms. updated to 0.7 gpoa --loglevel 0: https://pastebin.com/yAWHVhR0

If this is samba related bug, how can I debug it?

ghost commented 4 years ago

Yeah, sorry, I forgot to mention that cache removal is required. There was announce at: https://lists.altlinux.org/pipermail/samba/2020-July/004359.html with a notice about it. The reason for database update was introduction of the field handling original policy name so you can open cache with sqlite3 and see from which of GPTs the settings are coming from. This resulted in loss of compatibility between versions but allowed us to plan the development of GUI application for working with and debugging policies.

I also see another error in the new log:

2020-07-24 10:06:12:Backend execution error: 'NoneType' object is not iterable

The version 0.7 of GPOA is much stricter because of advancements in error handling mechanism. The error might be also present in 0.6 but skipped due to less stricter error handling. Unfortunately, the error message says nothing about roots of the problem (I think there is a bug in my code). I will update the logging mechanism ASAP so we will be able to track your problem with backend. Still the replication problem persists between releases.

I think it's better to cast @mastersin to help to debug the problem with slow replication.

ghost commented 4 years ago

So, I've prepared PR: https://github.com/altlinux/gpupdate/pull/28 which will help you to handle the problem. There will be much more log messages and also timestamps now include milliseconds so you will be able to tell where the problem is exactly:

You may wait for 0.8 release announce or build a package with this branch by yourself.

NexonSU commented 4 years ago

I just uploaded files from l10n branch to /usr/lib/python3/site-packages/gpoa and got this:

2020-08-05 14:27:06.687|[D00049]| Started GPO replication from AD DC|{} 2020-08-05 14:28:00.212|[D00050]| Finished GPO replication from AD DC|{}

Well, I found some time and debuged check_refresh_gpo_list in gpclass.py. This is how it works without any changes: https://pastebin.com/bUJn9jpm And with hardcoded DC IP address: https://pastebin.com/4Jaw2GYm

So, yes, that was a network related problem, because of incorrect DC detection.

Hardcoding selected_dc at util/windows.py:55 not working.

ghost commented 4 years ago

@NexonSU , I see that you've hardcoded IP address but AD is designed to work with domain names only. Otherwise we will experience problems with Kerberos tickets which are granted for domain names.

ghost commented 4 years ago

@NexonSU , is it possible for me to connect to your environment so I will be able to try to debug the problem with domain name resolution? You can e-mail me to nir -at- altlinux -dot- org if it is an option.

NexonSU commented 4 years ago

This is impossible, our network not connected to internet. Anyway, I tried to debug it myself, but I can't find any source codes for samba.net.finddc. Seems like, finddc just picking random DC from domain: https://pastebin.com/0Yw5rrc5

ghost commented 4 years ago

Have you tried --dc <DOMAINNAME> option of gpoa? I've designed it for the case of sysvol sync from specific domain and it worked for me at that time (about half a year ago).

NexonSU commented 4 years ago

--dc not working because of error at windows.py:64:

  File "/usr/lib/python3/site-packages/gpoa/util/windows.py", line 64, in set_dc
    samba_dc, dc)))
NameError: name 'dc' is not defined

But yes, it's working if I change dc to "dc".

This is good solution, but it's not supported by gpupdate.

ghost commented 4 years ago

I'll fix this.

ghost commented 4 years ago

PR: https://github.com/altlinux/gpupdate/pull/114 will allow you to specify DC in configuration file /etc/gpupdate/gpupdate.ini like:

[samba]
dc = your.domain.controller

This is expected to come with gpupdate veersion 0.8 after its release.

NexonSU commented 4 years ago

PR: #114 will allow you to specify DC in configuration file /etc/gpupdate/gpupdate.ini like:

[samba]
dc = your.domain.controller

This is expected to come with gpupdate veersion 0.8 after its release.

Awesome! I think this issue is resolved now.

ghost commented 4 years ago

Awesome! I think this issue is resolved now.

I'm still working on D-Bus timeouts but I can't grant you it'll be resolved with 0.8.

ghost commented 3 years ago

@NexonSU , the gpupdate version 0.8.1 made its way into p9. The issue with timeouts still not resolved but setting from https://github.com/altlinux/gpupdate/issues/98#issuecomment-689948024 must work. I suggest you to test it on separate machine due to bunch of breaking changes in some gpupdate subsystems (we worked on resolving migration issues but I can't 100% guarantee you that everything will work smoothly).

NexonSU commented 3 years ago

@NexonSU , the gpupdate version 0.8.1 made its way into p9. The issue with timeouts still not resolved but setting from #98 (comment) must work. I suggest you to test it on separate machine due to bunch of breaking changes in some gpupdate subsystems (we worked on resolving migration issues but I can't 100% guarantee you that everything will work smoothly).

Sorry for delay. Well, now it's even worse: https://pastebin.com/JrhY8D39

ghost commented 3 years ago

@NexonSU , the gpupdate version 0.8.1 made its way into p9. The issue with timeouts still not resolved but setting from #98 (comment) must work. I suggest you to test it on separate machine due to bunch of breaking changes in some gpupdate subsystems (we worked on resolving migration issues but I can't 100% guarantee you that everything will work smoothly).

Sorry for delay. Well, now it's even worse: https://pastebin.com/JrhY8D39

Sorry for the delay from my side. I can see that replication is working correctly now. What is broken is NTP applier module which receives unsupported NTP settings. There is no big problem with this because other appliers must still work despite this failure. Could you possibly send me (or post here) your NTP settings so I wil be able to expand NTP support or workaroud the problem (if there is any)? I suppose you're using not NTP protocol but other one supported by Windows, but Linux NTP servers support only NTP AFAIK. Are you considering using NTP on domain controllers?

ghost commented 3 years ago

@NexonSU , it was decided to give your issue the highest priority. You can e-mail me about problems directly to nir -at- basealt.ru for fastest response. It is even possible to establish video conference so you will be able to discuss your setup problems. Our team is willing to help you ASAP.

ghost commented 3 years ago

Testing https://github.com/altlinux/gpupdate/pull/124