JPCERTCC / Windows-Symbol-Tables

Windows symbol tables for Volatility 3
Other
72 stars 14 forks source link

Update of old (pre volatility 2.4.0) Symbol Tables #3

Closed mischw closed 3 months ago

mischw commented 3 months ago

I just came across a issue within volatility 3 that produced bad symbol tables with versions before 2.4.0 (They had the wrong Age value, if I understand correctly). It is not really critical since volatility just discards older symbol tables and tries to re-download and generate new symbol tables. But for usage in offline environments you can't really be sure if previously generated symbol tables will work. I think a way to prevent this is to regenerate the older symbol files or manually fix the contained json. Are there any plans to do so?

ikelos commented 3 months ago

We hadn't had any, but we're quite short on time. If you're happy to appropriately correct the age values in the existing files, that would be really useful. Just let us know and we'll find a way to get the file and review it and then update the pack... Thanks! 5:)

mischw commented 3 months ago

According to the filename all the symbol tables in the repository have an age value of "1". Is it safe to assume this correct? If so I would write a short script which updates all symbol tables to reflect that value in the JSON data too. As an alternative we could just regenerate all the symbol tables regardless of their status.

As of now only about 60% reflect an "age" of 1

grep '"age": .*' windows/*/* | awk '{print $3}' | sort | uniq -c | sort -n -r
    182 1,
     58 2,
     46 5,
     11 3,
      1 4,
ikelos commented 3 months ago

Ideally it would be best to regenerate them (and be able to regenerate them again if the need arose), but after regenerating them we'd want to flag any whose number didn't match the original for investigation. In a worst case we can take just the age from the new one, but it would also show us if there's a trend by MS to change/remove symbols from older files (which I suspect but have never been able to prove)...

mischw commented 3 months ago

Ok not sure if I understood correctly but first step I wrote a script which checks all the symbol files:

Found 298 symbol tables in Windows-Symbol-Tables/symbols/windows
[...many lines...]
616A94E33A4827B451B0E19C14C03792-1 should be regenerated: 1.2.0<2.4.0=True, 1≠2=True
2E6DEB6CFD444100AC0E803337A56E8E-1 should be regenerated: 1.2.0<2.4.0=True, 1≠2=True
76133B7D5E53E8EF3783A68665142583-1 should be regenerated: 2.0.0<2.4.0=True, 1≠2=True
Of 298 total symbol tables, 215 may need to be regenerated because they were generated with a volatility version older than 2.4.0
Of 298 total symbol tables, 116 may need to be regenerated because the 'age' in the filename does not match the 'age' in the json metadata

If we regenerate the symbol files based on the volatility version that would amount to ~72% of symbol files If we regenerate the symbol files based on the mismatch of "age" that would amount to ~38% of symbol files Third option: regenerate all of them and hope that this does not take away any information. Maybe a quick diff to see if things changed heavily

What do you think?

ikelos commented 3 months ago

Well, what I'm saying is that "regenerating them" may result in different files (not just different age values). So if you've got something that can figure out the stats, then comparing the output of isfinfo (which lists the number of types and symbols) for the originals and the generated files would be good. If it doesn't generate as much information, then we just want to update the age correctly, otherwise we can use the completely regenerated file. Hopefully that makes more sense?

mischw commented 3 months ago

Thanks. That makes sense :) I now have regenerated all of the symbol files. There does not seem to be any difference besides the obvious changes in the metadata block (volatility version, datetime generated and, like expected, the age value) when comparing them with diff. Just to make sure I also ran isfinfo with both sets of symbol tables and the output is the same (had to sort the output and cut off the wrong age value with the old symbol tables) I also confirmed with a memory image that volatility now correctly uses the already present symbol table instead of discarding it and trying to redownload.

I attached the output of my script where you can see the differences of the newly generated files stfix.log

If you want to check the isfinfo also: isfinfo_old.txt isfinfo_new.txt

Since we do not loose any information by regenerating them I would say it is safe to just replace them all.

ikelos commented 3 months ago

Ok, that's quite strange, for example. In the windows.zip pack, we have a working PDB for ntkrnlpa.pdb/BD8F451F3E754ED8A34B50560CEB08E31, but when I asked for volatility to regenerate it from Microsoft, the one it produces does not have base_types, types or enums, only symbols:

good-BD8F451F3E754ED8A34B50560CEB08E3-1.json.xz True (cached)   15  561 9418    49
vbad-BD8F451F3E754ED8A34B50560CEB08E3-1.json.xz True (cached)   0   0   10412   0

I can't find that hash in your generated data, but it does live in https://downloads.volatilityfoundation.org/volatility3/symbols/windows.zip, where did you get your original files from?

mischw commented 3 months ago

I hope I didn't confuse you since I mentioned this issue in the volatility repo too but I am specifically using the symbols from this repo (JPCERTCC). I would like to check the symbol tables from the volatility homepage too but I started with the ones from this repo to get the feet wet and then proceed to the bigger package from volatility. I can try and check the ones from volatility in the upcoming days.

ikelos commented 3 months ago

Oh! Ok, sure! Then yeah, if there's no difference between them, you might as well go for the full regeneration. For the windows.zip from volatility there will likely be discrepancies... 5:)

mischw commented 3 months ago

Great. I will open a new issue on the volatility repo once I made some progress with the larger package of symbol tables.

For the symbol tables from this repository I'll ping @shu-tom if he is interested in a pull request maybe? I could also upload the script so someone else can regenerate them if that is the preferred way.

shu-tom commented 3 months ago

Sorry for the late reply. And thanks for the great discussion. If you would like to give us a pull request for this repository, we would welcome it!

mischw commented 3 months ago

Done in #4. Thanks everyone!