baweaver / psutil

Automatically exported from code.google.com/p/psutil
Other
0 stars 0 forks source link

get_open_files might hang forever #340

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Create a process_iter()
2. Call get_open_files on each of the processes

What is the expected output?
The list of open files for each process.

What do you see instead?
The program hangs forever.

What version of psutil are you using? What Python version?
latest svn.

On what operating system? Is it 32bit or 64bit version?
64 bit Win 7

Please provide any additional information below.

This has already been encountered, there are some exception in the code for 
exactly this problem. I have found on more access mask where the problem arises 
- please add it to the list of exceptions. Patch is attached.

Original issue reported on code.google.com by amo...@google.com on 24 Nov 2012 at 2:39

Attachments:

GoogleCodeExporter commented 8 years ago
Could you provide some info as to where 0x00120089 comes from?

Original comment by g.rodola on 24 Nov 2012 at 7:48

GoogleCodeExporter commented 8 years ago
sadly, I can't. We ran it, it hung on a couple machines, this value was the
access mask.

Do you have any idea where the other values come from? Maybe we can explain
this one then. I looked at the others but the flags they represent are
completely independent, I can't see a pattern.

Sorry, that I don't have more info.

Original comment by amo...@google.com on 24 Nov 2012 at 7:52

GoogleCodeExporter commented 8 years ago
What I'd like to know is how exactly you came up with that value (you HAVE to 
know =)).
Did you debug the problem by printing handle.GrantedAccess and then you noticed 
that when it was ==  0x00120089 the function would hang?

If that's the case then I'm ok with applying the patch.
It's just that I don't feel comfortable in blindly accepting patches for an 
area of the code which is already pretty messy, that's all.

Original comment by g.rodola on 24 Nov 2012 at 8:04

GoogleCodeExporter commented 8 years ago
yeah, exactly. I just printed the value and it hung after getting
0x00120089.

I don't really like the code there either to be honest and I have disabled
this completely locally. There might be more values that hang that we don't
know about. It would be really good to come up with a pattern to determine
if the calls are dangerous...

Original comment by amo...@google.com on 24 Nov 2012 at 8:09

GoogleCodeExporter commented 8 years ago
Committed in r1548.
I agree it would be great if that part was refactored.
Unfortunately it was borrowed from somewhere I can't even remember.
Anyway, closing this out as fixed for now and thanks for the patch.

Original comment by g.rodola on 25 Nov 2012 at 10:38

GoogleCodeExporter commented 8 years ago
This limits the usefulness of the get_open_files function, since it won't show 
a lot of open files/documents. The only way to completely fix this is to use a 
kernel-mode driver. There might also be a new API in Windows 8 that allows you 
to query the file name properly, but I can't remember what it is...

Original comment by wj32...@gmail.com on 25 Nov 2012 at 10:42

GoogleCodeExporter commented 8 years ago

Original comment by g.rodola on 25 Nov 2012 at 10:56

GoogleCodeExporter commented 8 years ago
wj32.64, thanks for signaling.
I will revert r1548 for now.
If the prospect is to skip files which might be valid then I'd rather risk it 
and signal in the doc that get_open_files() on Windows is not fully reliable.

Original comment by g.rodola on 26 Nov 2012 at 11:22

GoogleCodeExporter commented 8 years ago
So I don't have a strong opinion on this issue since I have just stopped
using get_open_files altogether for now. I just want you to know that I ran
a simple get_open_files for all processes on 50 boxes and it locked up
exactly 20 of them so it's not a rare occurrence.

Original comment by amo...@google.com on 26 Nov 2012 at 8:53

GoogleCodeExporter commented 8 years ago
That's weird as despite I have your same configuration (Win7 64bit) I've never 
bumped into this issue before.
Just a thought which occurred to me just now: might it be the case that 
0x00120089 refers to a file living on a network filesystem (e.g. SMB)? In that 
case it might be ok to just skip it.

I still couldn't find the original blog entry where I borrowed that code, nor I 
was able to figure out the the meaning of those codes (0x00120089, etc.).
This is one of those cases where I regret about copying and pasting code 
without understanding it but I had no chance. For things such as this one 
Windows requires to go so close to the kernel that the resulting code is just a 
mess which makes no sense.

Original comment by g.rodola on 26 Nov 2012 at 9:54

GoogleCodeExporter commented 8 years ago
Given what wj32.64 noted above, I would recommend pulling the change back in to 
eliminate the known hang at this point. 

There's not a whole lot we can do to avoid skipping some open files unless 
we're willing to create a kernel-mode driver component for psutil. That's 
something I'm not really eager to do; it would provide more power and 
functionality but it would also be a bunch of added work and complexity to the 
project that I'm not sure is worth the trade-off. 

For now I'd say we skip those we know cause a hang, and note in the 
documentation that due to access rights some files may not show up on Windows.

Original comment by jlo...@gmail.com on 28 Nov 2012 at 2:17

GoogleCodeExporter commented 8 years ago
You can use Process Hacker's driver which is signed (for 64-bit). But users of 
your library might not like a Python module loading drivers without permission.

Original comment by wj32...@gmail.com on 13 Dec 2012 at 3:56

GoogleCodeExporter commented 8 years ago
I reverted the original r1548 commit because it turns out that that change 
basically skips *all* regular (legitimate) files, making get_open_files() 
basically useless on Windows.

@amoser@google.com: I'd like to know a bit more about the circumstances causing 
the hang.
For which processes do you have this problem?
The ones not owned by your user?
If that's the case we might check permissions up front and raise AccessDenied 
preemptively.

Original comment by g.rodola on 13 Dec 2012 at 2:10

GoogleCodeExporter commented 8 years ago
@amoser: 
Up. 
Can you tell whether the problem occurs only for processes owned by another 
user?

Original comment by g.rodola on 24 Feb 2013 at 7:49

GoogleCodeExporter commented 8 years ago
Sorry for the late response.

I have done some more testing and I have narrowed down the problem a bit. I can 
reproduce by running
> PsExec.exe -i -s C:\Windows\system32\cmd.exe

to get system and then

for p in psutil.process_iter():
   with open("o", "a") as f:
         f.write(p.name)
   try:
      print p.get_open_files()
   except psutil.AccessDenied:
      pass

This will hang on the PsExec.exe process. All other processes including those 
owned by other users and my own cmd.exe work just fine. I hope this helps 
debugging?!

Original comment by amo...@google.com on 26 Feb 2013 at 3:54

GoogleCodeExporter commented 8 years ago
Thanks for your suggestion: I'm now able to reproduce the issue.
I researched a bit and as wj32.64 suggested it seems there's no way to avoid 
the hanging on NtQueryObject() aside from using a driver.

Another solution seems to be using GetFileInformationByHandleEx():
http://0memory.blogspot.it/2008/08/getfinalpathnamebyhandle-api-hungs.html
...which is supported on Windows >= Vista only.
If it works we might think about dropping support for get_open_files() on 
previous Windows versions.
I will research into that.

Original comment by g.rodola on 27 Feb 2013 at 2:56

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
Updated csets after the SVN -> Mercurial migration:
r1548 == revision 9ee5c1e61dea

Original comment by g.rodola on 2 Mar 2013 at 12:13

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
I think I've found an easy fix for this. You can use the windows api call 
GetFileType to figure out what the file handle type is, one of which is 
FILE_TYPE_PIPE. I've fixed it for myself locally by adding an if statement in 
process_handles.c:185 to continue if it's not the disk file type. Seems to fix 
the problem I was having earlier with it blocking on a cygwin terminal. Hope 
this helps. Thanks for the nice process library!

    if (handle.ProcessId != pid)
        continue;

    if (GetFileType(handle.Handle) != 0x0001 ) //FILE_TYPE_DISK
        continue;

Original comment by gtmacdon...@gmail.com on 8 Jul 2013 at 10:37

GoogleCodeExporter commented 8 years ago
With that change I get this failure:

======================================================================
FAIL: test_get_open_files (__main__.TestProcess)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test\test_psutil.py", line 1380, in test_get_open_files
    self.assertIn(TESTFN, filenames)
AssertionError: 'C:\\Users\\user\\cygwin\\home\\user\\psutil\\$testfile' not 
found in [u'C:\\Windows\\SysWOW64\\en-US\\KernelBase.dll.mui']

Original comment by g.rodola on 9 Jul 2013 at 11:37

GoogleCodeExporter commented 8 years ago
GetFileType should just skip over those handles that aren't disk files. Perhaps 
the tests need to be updated?

Original comment by gtmacdon...@gmail.com on 9 Jul 2013 at 5:31

GoogleCodeExporter commented 8 years ago
No, (unfortunately) the test is correct.

Original comment by g.rodola on 9 Jul 2013 at 5:38

GoogleCodeExporter commented 8 years ago
test_get_open_files passes for me. I had failures for three others though. I'm 
running Python 2.7, 64 bit, on Windows 7, psutil-0.7.1.

======================================================================
FAIL: test_disk_partitions (__main__.TestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_psutil.py", line 566, in test_disk_partitions
    assert os.path.exists(disk.device), disk
AssertionError: partition(device='Q:\\', mountpoint='Q:\\', fstype='', 
opts='fixed')

======================================================================
FAIL: test_get_set_ionice (__main__.TestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_psutil.py", line 206, in inner
    return fun(self, *args, **kwargs)
  File "test_psutil.py", line 918, in test_get_set_ionice
    self.assertEqual(p.get_ionice(), value)
AssertionError: 2 != 0

======================================================================
FAIL: test_sys_cpu_times_percent (__main__.TestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_psutil.py", line 527, in test_sys_cpu_times_percent
    self._test_cpu_percent(percent)
  File "test_psutil.py", line 506, in _test_cpu_percent
    assert percent >= 0.0, percent
AssertionError: -800.0

----------------------------------------------------------------------

Original comment by gtmacdon...@gmail.com on 9 Jul 2013 at 6:09

GoogleCodeExporter commented 8 years ago
Not sure what to say. :(

Original comment by g.rodola on 9 Jul 2013 at 10:55

GoogleCodeExporter commented 8 years ago
:) Computers are a pain in the ass. I'm happy it works on my end though. Cheers.

Original comment by gtmacdon...@gmail.com on 10 Jul 2013 at 10:10

GoogleCodeExporter commented 8 years ago
I am able to replicate this bug reliably.  Hopefully here's some more 
information.

OS: Windows 2012 (x64)
Python: Python 2.7.6 (default, Nov 10 2013, 19:24:24) [MSC v.1500 64 bit 
(AMD64)] on win32
psutil: 1.2.1 (from PyPi)

Test case:
User: NT Authority\SYSTEM (must be SYSTEM to replicate)
Process: vds.exe (PID: 964)
Code: psutil.Process(964).get_open_files()

As an elevated Administrator:
>>> psutil.Process(964).get_open_files()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\psutil\__init__.py", line 808, in get_open
_files
    return self._platform_impl.get_open_files()
  File "C:\Python27\lib\site-packages\psutil\_psmswindows.py", line 199, in wrap
per
    raise AccessDenied(self.pid, self._process_name)
psutil._error.AccessDenied: (pid=964)

As SYSTEM:
Hangs

Original comment by jt...@vahna.net on 28 Feb 2014 at 6:17

GoogleCodeExporter commented 8 years ago
I'm starting to think whether it makes sense to get rid of get_open_files() on 
Windows.

Original comment by g.rodola on 28 Feb 2014 at 6:18

GoogleCodeExporter commented 8 years ago
Just came across this:

"Call CreateFileMapping on the duplicated handle (assuming it has GENERIC_READ 
or better access).  It doesn't seem to block on those troublesome named pipes.  
If the call succeeds, you have a filesystem-backed file and you can immediately 
CloseHandle on the resulting map handle and successfully query for the file 
handle's name.  If the function fails, it's not a filesystem-backed handle (or 
you have insufficient privileges anyway), and you don't need to fetch the name 
at all." 
(http://forum.sysinternals.com/discussion-howto-enumerate-handles_topic19403_pag
e10.html)

Original comment by jt...@vahna.net on 28 Feb 2014 at 11:05

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
I'm able to fix the issue using the CreateFileMapping workaround.  I've 
attached my diff against the 1.2.1 release.

Original comment by jt...@vahna.net on 3 Mar 2014 at 7:27

Attachments:

GoogleCodeExporter commented 8 years ago
Wow! These are very good news.
Do tests pass?
I will try your patch later during this week.
If this works it will be a great fix for the upcoming 2.0 version.

Original comment by g.rodola on 3 Mar 2014 at 7:31

GoogleCodeExporter commented 8 years ago
I've modified the patch to use the original GrantedAccess check as well as 
CreateFileMapping with checks to GetLastError.

The tests pass on Windows XP running as Administrator and SYSTEM.

It also passes my test case of vds.exe on Windows 2012 as SYSTEM.

Original comment by jt...@vahna.net on 3 Mar 2014 at 10:21

Attachments:

GoogleCodeExporter commented 8 years ago
Awesome! Thanks a lot. I'll keep you posted soon.

Original comment by g.rodola on 3 Mar 2014 at 10:26

GoogleCodeExporter commented 8 years ago
OK, I applied your patch as of revision 27636214b8ce and AFAICT it works (note: 
I was never able to reproduce the issue though).
Thanks a lot for your patch: this was a long standing issue and I'm happy it is 
finally fixed!

Original comment by g.rodola on 8 Mar 2014 at 1:46

GoogleCodeExporter commented 8 years ago
Closing out as fixed as 2.0.0 version is finally out.

Original comment by g.rodola on 10 Mar 2014 at 11:36

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
I'm not sure what's different in 2.0.0 but it's still hanging for me.  I'll try 
to debug it later this week.

Original comment by jt...@vahna.net on 10 Mar 2014 at 1:46

GoogleCodeExporter commented 8 years ago
Mmm... what I did was modify your patch (because it was against 1.2.1 instead 
of the latest repository version) and commit it in revision 27636214b8ce. 

Original comment by g.rodola on 11 Mar 2014 at 3:25

GoogleCodeExporter commented 8 years ago
Any news about this? I consider this a high-priority issue and I'd love to see 
it being definitely fixed.

Original comment by g.rodola on 13 May 2014 at 2:44

GoogleCodeExporter commented 8 years ago
If there's someone who's able to reproduce this issue (I'm not) but is not able 
to compile psutil on Windows I can also build an exe starting from a patch and 
attach it here for testing.

Original comment by g.rodola on 13 May 2014 at 2:49

GoogleCodeExporter commented 8 years ago
I am able to reproduce it and would be happy to test.  It's a consistent 
problem on one machine I have.

Original comment by sec...@gmail.com on 13 May 2014 at 6:43

GoogleCodeExporter commented 8 years ago
psutil has been migrated from Google Code to Github (see: 
http://grodola.blogspot.com/2014/05/goodbye-google-code-im-moving-to-github.html
).
Please do NOT reply here but use this instead:
https://github.com/giampaolo/psutil/issues/340

Original comment by g.rodola on 26 May 2014 at 3:05

GoogleCodeExporter commented 8 years ago
It seems there might be a fix for this at: 
https://github.com/giampaolo/psutil/issues/340
Please someone confirm whether the hanging issue is fixed or not and report 
back on github bugtracker. If you have troubles compiling psutil on Windows 
just ask and I'll send you a pre-compiled binary.

Original comment by g.rodola on 15 Jul 2014 at 3:26