Inode notation insufficient to uniquely identify data streams

Velocidex / go-ntfs

An NTFS file parser in Go

Apache License 2.0

64 stars 23 forks source link

The attached disk image charlie.zip demonstrates this problem via a handcrafted MFT entry that simulates the same fragmented behaviour by adding an ATTRIBUTE_LIST and storing some $DATA attributes in other MFT entries. The file /Nine.txt has the following streams:	Filename	Info
Nine.txt	Default $DATA stream, Non-resident
Nine.txt:111	Non-resident Alternate $DATA stream
Nine.txt:222	Resident Alternate $DATA stream
Nine.txt:333	Non-resident Alternate $DATA stream

The attached disk image charlie.zip demonstrates this problem via a handcrafted MFT entry that simulates the same fragmented behaviour by adding an ATTRIBUTE_LIST and storing some $DATA attributes in other MFT entries. The file /Nine.txt has the following streams:

Filename

Info

Nine.txt

Default $DATA stream, Non-resident

Nine.txt:111

Non-resident Alternate $DATA stream

Nine.txt:222

Resident Alternate $DATA stream

Nine.txt:333

Non-resident Alternate $DATA stream

Bug demo

Below is a listing of the contents and file sizes of the streams in Nine.txt.	Filename	Size	Content
Nine.txt	5000	9999999999999999...\<snipped>
Nine.txt:111	5005	"111111111111111111...\<snipped>
Nine.txt:222	56	"222222222222222...\<snipped>
Nine.txt:333	6005	"33333333333333...\<snipped>

Below is a listing of the contents and file sizes of the streams in Nine.txt.

Filename

Size

Content

Nine.txt

5000

9999999999999999...\<snipped>

Nine.txt:111

5005

"111111111111111111...\<snipped>

Nine.txt:222

"222222222222222...\<snipped>

Nine.txt:333

6005

"33333333333333...\<snipped>

To demonstrate the issue, we run the compiled exe against this image on the entry Nine.txt.

C:\go-ntfs>ntfs.exe cat "C:\temp\vr\tests\images\charlie_edited.dd” Nine.txt "111111111111111111..._<snipped>_

Thanks for this detailed report ! I identified the part in TSK that assigns ID to the attribute

https://github.com/sleuthkit/sleuthkit/blob/820b18589f1d86de6f33affd935cabe88b94580f/tsk/fs/ntfs.c#L1899

It looks like it just makes up an ID and stores it in a map to ensure the ID is unique. We could do the same thing to fix this issue.

There are two options:

expand the API as you did to include the stream name:
- pro: The id indicated is what the disk actually says - more forensically sound since the TSK Id can be randomly assign if one goes back to a hex editor they might be surprised
- con: More complex API - this makes it also leak into the VQL because now we need to include the stream name in the inode description.
Emulate the way TSK does it
- pro: Keep a simpler API, maybe compatible with the randomly assigned IDs that TSK uses.
- con: Since the attribute ID is randomly assigned it is not consistent with the disk bytes which can be surprising

Unlike in the TSK the VQL "inode" notation is actually a free form string and I think we really only care about the way the VQL interacts with the library - we have no external API stability requirement so it may not be terrible to extend the API as needed, as long as we pass the "inode" string back with sufficient information to uniquely identify the stream.

In this PR we extend the API to include the stream name so we could return an inode of the form 38-128-0-111 or 38-128-0-333. The problem with this approach is that we now have filename encoding issues in the inod string 38-128-0-this is a long name - maybe it is not a big deal?

Alternatively we can do what TSK does and come up with a constant identifier (i would tend to use the stream offset of count rather than a randomly assigned number) so something like 38-128-0-345 making it clear that it is a different id 0 stream from 38-128-0-232 for example.

Velocidex / go-ntfs

Inode notation insufficient to uniquely identify data streams #78

Bug demo