djha-skin / psutil

Automatically exported from code.google.com/p/psutil
Other
0 stars 0 forks source link

get_open_files does not include open files that have been deleted #382

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
>>> p = psutil.Process(25002)
>>> ofiles = p.get_open_files()

ofiles does not contain all files that are opened for this process.

What is the expected output?
ofiles should contain all files that this process has opened (deleted or not).

What do you see instead?
Only open files that exist are included.

What version of psutil are you using? What Python version?
# python --version
Python 2.7.3
python-psutil_0.7.0-1_amd64.deb

On what operating system? Is it 32bit or 64bit version?
64bit

Please provide any additional information below.

Processes can have open files that have since been deleted and these are not 
included in get_open_files. It would be useful if get_open_files could be 
passed a variable with the option to include these files, as what is currently 
returned does not match the list of open files that e.g. lsof returns.

For example, this would be useful for determining which services need to be 
restarted due to deleted files after package upgrades. Specifically, I was 
attempting to rewrite the "checkrestart"[1] command using psutil instead of 
lsof. It would greatly simplify the program if this functionality was present 
in psutil.

[1] 
http://anonscm.debian.org/gitweb/?p=collab-maint/debian-goodies.git;a=blob;f=che
ckrestart;h=6caaf109ba3a6ac764239fea01ddbc0425a6496d;hb=a8607224f80ac6630f358dc6
b5bd9f1897e68896

Original issue reported on code.google.com by furlo...@gmail.com on 15 May 2013 at 7:42

GoogleCodeExporter commented 9 years ago
Unfortunately we can't do that AFAIK.
get_open_files() on Linux is implemented as such:

- read symlinks contained in /proc/{PID}/fd directory
- resolve each symlink
- for each resolved symlink check whether that is a regular file and return it

Point is if a certain path no longer exists we cannot check whether it was a 
regular file or not.

Original comment by g.rodola on 15 May 2013 at 10:31

GoogleCodeExporter commented 9 years ago
For my use case, I don't care if it was a regular file or not. The attached 
patch adds include_deleted as a variable. I remove the part of the filename 
containing " (deleted)" but we could also leave that intact and allow the user 
to parse it. This approach also has the advantage of letting the user know the 
file is deleted.

To be honest, I'm not sure that only regular files should be returned. Is there 
an issue with returning directories, links (outside of /proc) or device files? 
Is this another option that could be added? It would be more "truthful" to 
return all types of open files that a process has. Indeed, this is what I 
assumed get_open_files would do.

Original comment by furlo...@gmail.com on 15 May 2013 at 1:06

Attachments:

GoogleCodeExporter commented 9 years ago
I decided to return files only (and connections only) because unifying all the 
possible fd types across multiple platforms with a consistent API is impossible.
Please read my related comments in issue 285.

As for deleted files: I think it's OK to return them as well, assuming other 
platforms implementation (OSX and FreeBSD) behave the same (and they should - 
but I'll need to check first).
If that's the case I don't think an explicit 'include_deleted' argument will be 
necessary.

Original comment by g.rodola on 15 May 2013 at 1:23

GoogleCodeExporter commented 9 years ago
The API would be much more useful if it just returned all open files - deleted 
or not, regular or not.

Different kinds of files are different.  There's no need to unify them somehow 
in psutil.  Just present the information that's available and let applications 
do what they want with it.  Perhaps the current behavior is handy for some 
applications, so by all means preserve it somehow.  But some API in psutil 
somewhere that just returns all open file descriptors in the process would be 
nice.

Original comment by calderon...@gmail.com on 25 Jun 2014 at 1:59

GoogleCodeExporter commented 9 years ago
I definitely agree with #4 as it seems we have filtered an arbitrary group of 
files. I think it should be up to the application to do any filtering.

Original comment by furlo...@gmail.com on 26 Jun 2014 at 12:07

GoogleCodeExporter commented 9 years ago
Unifying all the file descriptors under a common API is very difficult and 
probably not even possible. The differences between UNIX platforms are too 
many; if you think about how various the output of "lsof" cmdline tool is you 
can get an idea.
As I explained here:
https://code.google.com/p/psutil/issues/detail?id=285#c5
...the easiest thing for me to do was to differentiate between "files" and 
"sockets" and provide two specific methods returning only those.
All other kinds of file descriptors are just too hard to express under an API 
which makes sense and I also don't want to reinvent lsof, which is a pretty 
huge piece of messy code.
Returning "all file descriptors", as you suggest, doesn't make sense if you're 
not able to figure out what you're dealing with (a regular file, a pipe, ...).

Original comment by g.rodola on 1 Jul 2014 at 8:38

GoogleCodeExporter commented 9 years ago
Could we emit a warning when there is a file descriptor whose format we don't 
understand?

At least that way we would start to collate a list of unsupported file 
descriptors. Each can then be dealt with on a case-by-case basis (or by the 
application)

Including an option to suppress that warning would give backwards compatibility.

I think the main issue here is that most people expect that _all_ open file 
descriptors will be returned.

Original comment by furlo...@gmail.com on 7 Aug 2014 at 8:22

GoogleCodeExporter commented 9 years ago
What would you do with a warning? It's not like a returned object (e.g. a list 
of fds) which you can actually use. 
Also, given that psutil does not provide the necessary bits to figure out what 
a file descriptor is (a pipe? a socket? a directory?) you wouldn't be able to 
do anything useful even if you had that list.
Again, this is something which just cannot be expressed in an API which makes 
actual sense, and even if it could it would be too much work to implement 
across all UNIX variants (take a look at lsof source code and you'll understand 
what I'm talking about). 

Anyone who really needs to do this on Linux can simply roll in its own 
implementation based on reading /proc/{pid}/fd and psutil code can be used as 
an example:
https://github.com/giampaolo/psutil/blob/a6ecb26350231a9da4bf456d37c79cf1a6d03eb
8/psutil/_pslinux.py#L1145

Note: the project has been moved to github so please post your replies at 
https://github.com/giampaolo/psutil/issues/382 

Original comment by g.rodola on 7 Aug 2014 at 3:23