Closed GoogleCodeExporter closed 8 years ago
Hmmmm, I think an object should return as much information as possible, so if
NumberOfFunctions is there, then _IMAGE_EXPORT_DIRECTORY should allow access to
it.
An interesting question is, if the first half of a structure is gone, should
you be able to instantiate an object at an invalid address, and read the tail
half of it's members?
At the moment, if an object is created using obj.Object(), it'll return a
NoneObject, but if the object is instantiated directly, it'll raise an
InvalidOffsetError. If the CType created by obj.Object() is valid, you won't
get a NoneObject, but when you then ask for a member, it'll be instantiated
directly at an invalid address and raise the exception.
So we've got several options:
1) Leave it as is
2) Catch InvalidOffsetErrors in the CType member lookup and convert them to an
AttributeError
3) Catch InvalidOffsetErrors in the CType member lookup and return NoneObjects
4) Force the CType member lookup to use obj.Object when returning members
5) Have obj.Object return a NoneObject if any of the addresses within
range(offset, offset+structsize) are invalid
6) Something else
So some obversations on these:
1) Not very helpful
2) Changing one error for another
3) What NoneObjects were designed for, but sometimes difficult to spot. Also
changes invalid offset handling from exception handling to explicit checking
for NoneObjects.
4) Possibly a good idea, but needs discussing with scudette, because it may
turn our pre-compiled vtype language into an interpretted language
5) May cause confusion if it isn't recursive and recursive checking will likely
invalidate most objects. Will likely slow down object creation for large
objects.
6) I dunno what it'd be
So, 3 & 5 look the most viable options, 4 would be a big shift, and 6 has the
possibility of being a brilliant solution! 5:) It would probably be worth
CCing in some of the others to get their opinion...
Original comment by mike.auty@gmail.com
on 7 Jan 2011 at 2:35
maybe this doesn't have anything to do with it, but the intel address space at
least supports read and zread which have different behaviors when an unmapped
page is hit:
http://code.google.com/p/volatility/source/browse/branches/Volatility-1.4_rc1/vo
latility/plugins/addrspaces/intel.py?r=584#303
Original comment by atc...@gmail.com
on 9 Jan 2011 at 1:53
Original comment by mike.auty@gmail.com
on 21 Jan 2011 at 8:26
Sorry for the bugspam, but better to get this right now than later once it's
more in use.
Original comment by mike.auty@gmail.com
on 4 Feb 2011 at 9:35
Just a note that we'll also have to handle the case of strings, or similar,
which span pages. Most likely we'll need to figure out the size of the struct,
stick the relevant pages together and then operate on that. Might take a
little work...
Original comment by mike.auty@gmail.com
on 21 Jul 2011 at 11:07
Original comment by mike.auty@gmail.com
on 26 Jul 2011 at 6:45
Original comment by jamie.l...@gmail.com
on 26 Jul 2011 at 11:40
Hey guys, I'd like to expand on what Ikelos said about handling strings. In
particular, a structure member that points to a unicode string:
'_SERVICE_RECORD' : [ None, {
'ServiceList' : [ 0x0, ['_SERVICE_LIST_ENTRY']],
'ServiceName' : [ 0x8, ['pointer', ['UnicodeString', dict(length = 256 * 2)]]],
'DisplayName' : [ 0xc, ['pointer', ['UnicodeString', dict(length = 256 * 2)]]],
I subclassed basic.String to create the UnicodeString type. It looks like the
following (differs from basic.String only in the split statement that uses
\x00\x00 instead of \x00
class UnicodeString(basic.String):
def __str__(self):
data = self.v()
result = data.split("\x00\x00")[0]
if not result:
return ""
return result
According to MSDN
(http://msdn.microsoft.com/en-us/library/ms682450%28VS.85%29.aspx) the
ServiceName and DislayName can be max 256 characters (so 512 bytes).
When testing a plugin, some of the ServieName and DisplayName fields were
coming up blank, for example:
$ python vol.py -f tdl3.vmem svcscan
....
Offset: 0x82aec0
Order: 258
Process ID: -
Service Name: <----------------------------- this should not be blank!!
Display Name: Network Provisioning Service
Service Type: SERVICE_WIN32_SHARE_PROCESS
Service State: SERVICE_STOPPED
Binary Path: -
I wondered why it was blank and went to check it out in volshell. The base of
the _SERVICE_RECORD structure is 0x82aec0:
In [2]: dd(0x82aec0)
0082aec0 0082ae30 00000000 0082af34 000b45b8
0082aed0 00000102 00000000 76724573 00000000
0082aee0 00000000 00000000 00000020 00000001
And so 0082af34 is the ServiceName field.
In [4]: db(0x0082af34)
0x0082af34 78 00 6d 00 6c 00 70 00 72 00 6f 00 76 00 00 00 x.m.l.p.r.o.v...
0x0082af44 00 00 00 00 17 32 12 00 00 10 00 00 78 01 82 00 .....2......x...
0x0082af54 78 01 82 00 00 00 00 00 00 00 00 00 00 00 00 00 x...............
So the name is xmlprov. The problem is we can't read the whole 256 * 2 = 512
bytes:
In [5]: db(0x0082af34, length=512)
Memory unreadable at 0082af34
It turns out, we can't read past 0x0082afff (page cuts off at 0x0082b000).
Just to make sure the memory is allocated, I checked with vadinfo.
VAD node @814f29a0 Start 00820000 End 0084ffff Tag VadS
Flags: PrivateMemory
Commit Charge: 36 Protection: 4
In this case I suppose I'd like the UnicodeString to contain as much of the
string as possible before crossing the page boundary. Since "xmlprov" is much
smaller than 256 chars anyway, I'd really be getting the full string...but of
course that won't be the case always (especially if we're talking about
structures crossing page boundaries like in the original issue here).
Anyway, just wanted to document this, and CC some others.
Original comment by michael.hale@gmail.com
on 1 Oct 2011 at 8:02
This is still occurring even after the page translation patch. It doesn't
happen often, but here we can see it happening with a registry key:
$ ./vol.py -f memory.img --profile=Win7SP0x86 printkey -K
'ControlSet001\Control\WMI\Autologger\EventLog-System'
Volatile Systems Volatility Framework 2.1_alpha
Legend: (S) = Stable (V) = Volatile
----------------------------
Registry: \REGISTRY\MACHINE\SYSTEM
Key name: EventLog-System (S)
Last updated: 2009-07-14 04:37:09
Subkeys:
(S) {01979c6a-42fa-414c-b8aa-eee2c8202018}
(S) {04268430-d489-424d-b914-0cff741d6684}
(S) {06edcfeb-0fd0-4e53-acca-a6f8bbf81bcb}
(S) {0c478c5b-0351-41b1-8c58-4a6737da32e3}
Traceback (most recent call last):
File "./vol.py", line 135, in <module>
main()
File "./vol.py", line 126, in main
command.execute()
File "volatility/commands.py", line 101, in execute
func(outfd, data)
File "volatility/plugins/registry/printkey.py", line 101, in render_text
for s in rawreg.subkeys(key):
File "volatility/win32/rawreg.py", line 120, in subkeys
if i.Signature.v() == NK_SIG:
File "volatility/obj.py", line 486, in v
(val,) = struct.unpack(self.format_string, data)
struct.error: unpack requires a string argument of length 2
Ikelos looked at this before the patch, this is what he had to say about it
then:
"This appears to be a variant on issue 57, and not something we can fix
in the next five days. Basically, the two character data block falls
across the very end of a page (so Signature.obj_offset is 0x#####fff).
The next byte is on a different page, and I guess that this can't even
read another byte after its location.
Really, we should be determining the length of each value, ensuring it's
all within the page, and if not then we need to reconstruct the
consecutive pages, and work on those.
If one of the pages isn't resident then we probably ought to throw a
page fault exception, but that's up for debate (basically, do we think
plugin devs will spend the time to always check for page faults)..."
It is still exhibiting this behavior after the patch, however:
$ svn info
[snip]
Revision: 1153
[snip]
Original comment by jamie.l...@gmail.com
on 15 Dec 2011 at 7:52
also see it in Revision 1159
Original comment by jamie.l...@gmail.com
on 15 Dec 2011 at 8:59
Ok, so here's a patch that vastly improves the transparency of NoneObjects, so
that they can be used as pretty much any object and the result should always be
the NoneObject. Specifically, we can now concatenate with strings and the
error will follow through.
I've also tried to ensure that intel/legacyintel/crash/hibernate all return
NoneObject if there's a problem reading. Please give the patch a go with any
of the images exhibiting exceptions related to this and let me know if there's
any further problems...
Original comment by mike.auty@gmail.com
on 17 Dec 2011 at 2:04
Attachments:
This issue was closed by revision r1164.
Original comment by mike.auty@gmail.com
on 19 Dec 2011 at 11:02
Reopening this issue. Ikelos and I have decided that we'll just have to deal
with structures crossing page boundaries for now (until we have a way of
marking individual members as valid or invalid). However as for strings that
cross page boundaries, Ikelos has come up with the attached patch which I think
is good but need to test more.
Original comment by michael.hale@gmail.com
on 2 Feb 2012 at 2:00
Attachments:
Hey guys,
This is a patch that introduces two new string types to basic.py (placement
here so they're available to all profiles, including linux, osx, etc). The
point is to treat C strings (null terminated) in a manner more consistent with
how OS's actually deal with them.
Currently, our String class requires a length argument. If the string starts
100 bytes from the end of a page, the next page is unavailable, and the
string's length argument is 200 bytes, then the read operation on 200 bytes
will fail and we'll get nothing (obj.NoneObject). This is bad because although
the maximum length is 200 bytes, the string may only contain 5 characters with
a terminating null. Instead of returning those 5 chars, we return
obj.NoneObject.
So the new string classes will not override or replace the existing String,
rather they'll just provide an alternate choice for plugins that may need to
read strings that cross page boundaries. In comment #8 above I showed an
example where a service name was showing up none instead of "xmlprov". By using
CString instead of String now that is fixed and I see "xmlprov" as the OS would
see.
Original comment by michael.hale@gmail.com
on 21 Feb 2012 at 3:12
Attachments:
Just a small comment. If CString is supposed to be null terminated, this is
faster:
result = self.obj_vm.zread(self.obj_offset, i).split("\x00", 1)[0]
It will take up to the first NULL (or the length of the string if it has no
NULLs. I think zread will null pad invalid pages already, which will cause this
to only read up to the invalid page.
I think this is a good patch, but should be applied to String and
UnicodeString. Its generally impossible to predict in advance if you are likely
to run into this situation just so you can use a special variant of String for
this purpose.
Its almost always the case that reading into invalid pages is surprising, so it
is better to just truncate the read (which this will do). Optionally a log
message might help if you are really concerned.
Original comment by scude...@gmail.com
on 21 Feb 2012 at 3:25
I agree with scudette, zread is definitely the better option, and could be
applied to the standard String class. Before that can happen, however, we need
to ensure that BaseAddressSpace defines zread, and that it's then populated for
all existing address spaces.
Also, we should ensure there's a "\x00" at the end of the read-in string, so
that we guarantee the split will work...
Original comment by mike.auty@gmail.com
on 21 Feb 2012 at 9:49
You do not need to add a null to the end. The split will work even without it.
Original comment by scude...@gmail.com
on 21 Feb 2012 at 10:01
Fair enough
Original comment by mike.auty@gmail.com
on 21 Feb 2012 at 10:05
Hey guys,
Here's patch draft #2. Things that have changed:
* No more separate classes for CString and UnicodeCString. The changes have
been applied to String so that anything already using String types will
automatically benefit.
* Changed the name of UnicodeCString to just UnicodeString. Now the
_UNICODE_STRING type instantiates UnicodeString instead of String.
* The old check in String.__str__ to make sure a string was null terminated has
been removed, since the String.v() now ensures null termination.
* As you will see in the code, the one remaining issue is when UnicodeString is
used for LPWSTRs and not _UNICODE_STRINGs. The difference is that
_UNICODE_STRING.Length specifies the exact number of bytes (not including null
termination) pointed to by the Buffer. Thus before we do the split("\x00\x00",
1)[0] the string is already the right length. In LPWSTRs where we just have a
maximum string length, and the least significant byte is zero (like it will be
in English strings), then that least significant byte is chopped off when we do
split("\x00\x00", 1)[0]. That causes the .decode().encode() statement in
__str__ to not properly recognize the last character, and ultimately return a
string with the last char missing.
So I had to do this, which is obviously quite ugly:
## Hack to deal with English LPWSTR
if not result.endswith("\x00"):
result += "\x00"
Anyone have an idea how to handle this better? Any other problems with the
patch?
Original comment by michael.hale@gmail.com
on 23 Feb 2012 at 3:35
Attachments:
So this patch may work for the common cases, but it will still require the
AddressSpace API adding the zread function. At the moment many address spaces
implement it, but they are not forced to, and it's entirely possible to get an
address space that doesn't (which will then exception out if these are ever
called against it). We need to add a suitable default function to
BaseAddressSpace, and then check that all of our address space implementations
supply a concrete version of it.
As to the tail ending thing, if LPWSTRs are always in byte pairs, then don't do
a find on the string, but split it up in a list of byte-pairs, and check if any
of them match "\x00\x00". That will then ensure we terminate on what we're
supposed to, rather than mistaking "\xFF\x00\x00\xFF\x00\x00" as terminating
early.
The UnicodeString class seems a little overkill for what amounts to checking
for "\x00\x00" rather than "\x00". Would it be possible to refactor the String
class to take a terminator, and then simply override that?
I'll have a think about it more when I'm not as sleepy, please poke me on the
weekend if I haven't said anything more by then...
Original comment by mike.auty@gmail.com
on 23 Feb 2012 at 3:53
Hi Michael,
A couple of comments:
data = self.v()
if not data:
return ""
return data
Can be simplified to:
data = self.v() or ""
return data
Second issue is that this handling of unicode is incorrect on a number of
levels. First this is using the backslashreplace error handler (in multiple
places) which is lossy - in other words you _will not_ be able to get back to
unicode from the output of this function. See for example
http://bytes.com/topic/python/answers/702688-reverse-encode-backslashreplace
IMHO you need to encode using the utf8 codec rather than ascii.
The second issue is that you are splitting on a double null in the UTF16
encoded data. The string is a unicode string, so you need to decode it first,
then split on a single null:
self.obj_vm.zread(self.obj_offset, self.length).decode("utf16-le",
"ignore").split("\x00", 1)[0]
This will solve your hack regarding the english text. You will also return a
unicode string from .v() which is what should happen - we should always be
returning the correct types from .v() (e.g. we return an int from Interger.v()
etc).
Original comment by scude...@gmail.com
on 23 Feb 2012 at 4:01
Ok, so here's a patch that adds zread as a required function to
BaseAddressSpace, and includes basic implementations for any of the address
spaces that didn't include one.
I couldn't come up with a suitable base implementation for zread that would
generally apply, so it's currently just a virtual method that needs populating
in BaseAddressSpace...
Original comment by mike.auty@gmail.com
on 24 Feb 2012 at 8:36
Attachments:
Seriously? Monkeypatching? Why dont we just delegate zread to read for the base
class. Note that we have been using zread forever in the prodexedump etc
commands. Also zread really only makes sense in the virtual address spaces
because in the physical ones it doesnt really make sense to read outside the
mapped range (it would mean outside the file).
I think its enough to simply delegate zread to read in the base class, for
safety but in practice it will never be used.
Original comment by scude...@gmail.com
on 24 Feb 2012 at 8:41
Hey guys,
Thanks for pursuing this issue with me ;-)
OK so a few things...
Thanks for the data = self.v() or "" tip, I will incorporate that into the new
patch. This line is also very useful:
self.obj_vm.zread(self.obj_offset, self.length).decode("utf16-le",
"ignore").split("\x00", 1)[0]
I had to use "utf16" instead because it said "utf16-le" isn't a recognized
decoder on my machine. The reason I used ascii/backslashreplace is because
that's what windows._UNICODE_STRING was doing (see
http://code.google.com/p/volatility/source/browse/trunk/volatility/plugins/overl
ays/windows/windows.py#162). We've known for a while that we need to fix some
things for better unicode support, but I don't want to try and solve all
problems at once here (although that'd be awesome ;-))
So the latest patch has some improvements - may not be perfect yet but should
get us a little closer. The main difference is that UnicodeString.v() returns a
Unicode string and UnicodeString.__str__() returns an ascii backslash string.
This directly affects _UNICODE_STRING.v() and _UNICODE_STRING.__str__() in the
same way. Note: the filescan/mutantscan plugins that just print FileName.v()
will now output strings like u'filename' to the terminal. If we want the
ascii/backslash, it should be changed to str(FileName) or str(FileName.v())
At least in this state, we've fixed the crossing page boundaries issue and can
then focus on what needs to be done for better unicode encode/decode support.
Ikelos, thanks for picking up the zread part of this issue. Without applying
your volatility-zread.patch I would get this:
$ python vol.py -f ~/idt_sneaky.vmem printkey
Volatile Systems Volatility Framework 2.1_alpha
Legend: (S) = Stable (V) = Volatile
----------------------------
Registry: \Device\HarddiskVolume1\Documents and Settings\LocalService\Local
Settings\Application Data\Microsoft\Windows\UsrClass.dat
Traceback (most recent call last):
File "vol.py", line 135, in <module>
main()
File "vol.py", line 126, in main
command.execute()
File "/Users/Michael/volatility_trunk/volatility/commands.py", line 101, in execute
func(outfd, data)
File "/Users/Michael/volatility_trunk/volatility/plugins/registry/printkey.py", line 97, in render_text
outfd.write("Key name: {0} {1:3s}\n".format(key.Name, self.voltext(key)))
File "/Users/Michael/volatility_trunk/volatility/plugins/overlays/basic.py", line 63, in __format__
return format(self.__str__(), formatspec)
File "/Users/Michael/volatility_trunk/volatility/plugins/overlays/basic.py", line 59, in __str__
data = self.v() or ""
File "/Users/Michael/volatility_trunk/volatility/plugins/overlays/basic.py", line 51, in v
result = self.obj_vm.zread(self.obj_offset, self.length).split("\x00", 1)[0]
AttributeError: 'HiveAddressSpace' object has no attribute 'zread'
If all goes well, perhaps we could get the zread patch applied first (whether
the existing patch or a new one that delegates zread to read) and the string
patch second...which I think would be good to close this issue. Then we'll have
to pursue the long awaited unicode encode/decode support problem.
Original comment by michael.hale@gmail.com
on 24 Feb 2012 at 3:03
Attachments:
Hiya, yep, no probs.
I can commit the zread patch pretty soon. I'm happy to expand out the "zread =
read" lines into the longer winded "def zread(self, addr, length): return
self.read(addr, length)", but they're effectively identical. I've no idea why
scudette thinks that's monkey patching, it's the defining the zread function to
be identical to read, not changing the definition of existing code at run-time.
However, I'll go with whichever one stops people complaining at me...
I'm not going to delegate zread to read on the base class because it'll make it
far too easy to write a new address space that doesn't fulfill the semantics
it's supposed to. If we wanted a free zread in the base class, I'd write a
read-byte-by-byte that's guaranteed to return the correct value, but it'd be so
inefficient that everyone would need to override it anyway. Better to require
AS writers to think about the problem and then delegate to read if it turns out
that's ok.
Finally, if the UnicodeString is pretty much identical to String save for the
decoding, can we not have that as a parameter, and then have the UnicodeString
define a different decoding? Is it possible to decode an ascii string and have
it returned as ascii rather than unicode?
Original comment by mike.auty@gmail.com
on 24 Feb 2012 at 9:39
Nice work on getting the zread patch in Ikelos. So yeah I think its possible to
have a String(encoding="utf16') parameter or something. I'd need probably a
while longer to work up the patch and test it out - unless someone else wants
to ;-)
Original comment by michael.hale@gmail.com
on 25 Feb 2012 at 4:33
So, were you thinking about something like this? 5:)
Original comment by mike.auty@gmail.com
on 25 Feb 2012 at 5:06
Attachments:
What would we use for a C like String (i.e. a char array)? Would we be
then forced to define some kind of special CString class? I thought
the original intention was to define a String as a char array and a
UnicodeString for actual text representation. Also another idea is to
take the default encoding from the profile, then the windows Strings
will be utf16 while the linux ones will be utf8.
Michael.
Original comment by scude...@gmail.com
on 25 Feb 2012 at 5:13
This patch will corrupt C strings with high bit chars (i.e. not ascii due to
the encode. We would need to introduce a CString to handle those cases where we
really want a char array. Also this will fail if self.v() returns non ascii
data (i.e. for other encoding like utf8):
+ return str(self.v())
Original comment by scude...@gmail.com
on 25 Feb 2012 at 5:15
Surely if we really want a char array, we'd instantiate it as an Array of char
(or just read bytes from the AS)?
_UNICODE_STRING is currently defined in the windows profile, so you'd need to
define something different for linux anyway.
Yeah, for some reason I thought the str would just automagically convert it,
but I guess not, that will need some work, but I'm off out soon. We should
probably make a decision on how we expect output of a unicode string to appear
after str()...
Original comment by mike.auty@gmail.com
on 25 Feb 2012 at 5:34
Yeah but the semantic meaning of C str is an array of char and we
expect to get from .v() "foo" rather than ["f", "o", "o"].
_UNICODE_STRING is a windows structure which contains a unicode string
and a length. The UnicodeString class is actually shared between all
operating systems, and represents a text string encoded in someway in
memory. IMHO String() and UnicodeString() should be different, maybe
name the first ByteString() or just Bytes().
Something like this should be used everywhere you need to output a
unicode object:
def SmartStr(string):
"""Enforce a string into a byte string."""
if isinstance(string, unicode):
return string.encode("utf8")
else:
return str(string)
then
def __str__(self):
return SmartStr(self.v())
So in the unicode case we just get utf8 encoded output. Maybe we can
make the output encoding a flag or something to ensure this works well
on windows too. Its also possible to sniff the output encoding by
checking sys.stdout.encoding - if the program is connected to a
terminal of some sort it should be set correctly, otherwise we need to
default to something sensible.
Michael.
Original comment by scude...@gmail.com
on 25 Feb 2012 at 5:50
Original comment by mike.auty@gmail.com
on 10 Mar 2012 at 7:30
Original comment by mike.auty@gmail.com
on 10 Mar 2012 at 11:37
Ok, well between r1500 for the scudette branch and r1563 for trunk, I'm going
to mark this issue as fixed.
Original comment by mike.auty@gmail.com
on 21 Mar 2012 at 11:42
Original issue reported on code.google.com by
michael.hale@gmail.com
on 7 Jan 2011 at 2:07