DinoTools / dionaea

Home of the dionaea honeypot
https://dionaea.readthedocs.io/
GNU General Public License v2.0
703 stars 181 forks source link

DoublePulsar emulation is broken; creates new, corrupted variants of WannaCry #240

Closed bontchev closed 5 years ago

bontchev commented 6 years ago
ISSUE TYPE
DIONAEA VERSION
Dionaea Version 0.8.0-28-g83eb99a
Compiled on Linux/x86_64 at Oct  5 2018 15:18:21 with gcc 5.4.0 20160609
Started on yoda running Linux/x86_64 release 4.15.0-36-generic

(We have made some modifications to make it save the data into a MySQL database, instead of into an SQLite database, but haven't touched anything in the rest of the code.)

CONFIGURATION

We have enabled only the SMB service.

OS / ENVIRONMENT
SUMMARY

The captured samples of the WannaCry worm are totally corrupted.

STEPS TO REPRODUCE

We have set up an isolated network with only two virtual machines. One of them is a Windows machine on which the original WannaCry worm is started (and which gets encrypted by the ransomware component as a result). The other is a Ubuntu Linux machine running Dionaea. The network is configured so that any IP address generated by WannaCry would resolve to the address of the machine running Dionaea. The rest of the Internet is not reachable, for safety reasons.

EXPECTED RESULTS

The copies of WannaCry sent to the honeypot must be received correctly, with the proper file size and without corruption.

ACTUAL RESULTS

The original WannaCry worm is 3,723,264 bytes large. The sample captured by Dionaea is 5,267,459 bytes and is badly corrupted - it will not execute, resulting in a message from Windows that this is not a valid Win32 application.

We also have a real (not a VM) Dionaea connected to the real Internet and it gets literally hundreds of such corrupted 5 Mb-large samples. They have all been sent to VirusTotal by other people running the Dionaea honeypot, creating new variants (albeit non-working ones) and forcing the anti-virus companies to implement detection of a problem that is essentially created by Dionaea.

Folks, this has to stop. A fix is needed urgently. Clearly, something is broken in the DoublePulsar emulation. We do not know yet if this corruption always happens when a file is sent the way WannaCry is sending itself, or if it happens only with large files, or if it happens only with WannaCry. Unfortunately, the other SMB worm, NotPetya, does not work in virtual environments, so we cannot conduct tests with it. We'll try to send some other file using the EternalBlue/DoublePulsar implementation of Metasploit.

numbers90807060 commented 5 years ago

I've also experienced this issue. I ran Dionaea with SMB on Ubuntu 16.04 for approximately two hours yesterday and captured several malware samples. Almost all samples were the same file size, all corrupt, and all had been submitted to VirusTotal with comments indicating that other Dionaea honeypots had captured and uploaded the same sample.

I am happy to offer my time to help test anything or provide further information if it will assist in the identification and remediation of this problem. Dionaea is a fantastic research tool but this issue is currently rendering it useless for our needs.

bontchev commented 5 years ago

Currently, all captured WannaCry samples are corrupted by Dionaea. It's not a connection issue or damaged variants being out there - it is 100% Dionaea that corrupts them every time, even on a LAN.

We also experimented with sending an executable via the EternalBlue exploit implementation of Metasploit. Corruption occurs there too, but a very small one - 3 bytes of zeroes are added to the end of the executable. Compared to that, the WannaCry corruption makes a total mess of the executable.

My colleague claims to have figured out a way to fix the problem - "not very elegant but just 5 lines of code", she said. We're in the process of testing it "live" on the honeypot (as opposed to in a test environment). If it indeed works properly and captures real samples, we'll submit a pull request.

bontchev commented 5 years ago

Hmm.... Having reviewed my colleague's solution, I'm not happy with it. Oh, it works. WannaCry samples are captured successfully. But her code is trying to explicitly detect WannaCry sendings by looking for the hard-coded "192.168.56.20" and "172.16.99.5" strings in it and processes them differently from normal EternalBlue SMB file sendings. This is unacceptable, since it's something that can be easily changed and then the samples will be broken again. Furthermore, she truncates the zero bytes at the end of the received samples, which is also unacceptable - what if somebody is sending a file that needs to have zeroes at the end? We'll keep digging...

In any case, if you are reading this, please do not send captured samples - not to VirusTotal, nor to anyone else. Dionaea is corrupting them.

fe7ch commented 5 years ago

@gento Could you please look into it?

bontchev commented 5 years ago

Having watched with Wireshark what kind of packets are received during this communication, I have a somewhat better understanding of why the WannaCry samples are corrupted.

When Dionaea captures the SMB packets, it tries to locate the beginning of the (XOR-encoded) EXE file that is being sent. In order to do that, it searches the buffer for 'MZ\x90'. (Strictly speaking, that's wrong - the 0x90 byte is not part of the EXE header signature, but all compiled Windows executables have it there, so no big deal, I guess.) Unfortunately, this signature is found four times in the WannaCry sample.

This is because the worm (an EXE file) carries the ransomware (another EXE file) in a resource. The ransomware itself carries a ZIP archive in a resource and this archive itself contains (among other things) two EXE files. (Although the archive is encrypted, so not sure why any part of the header would be visible.) In any case, the multiple MZ signatures confuse Dionaea and it does not construct correctly the file from the contents of the packets.

This is specific for WannaCry (or anything else that contains multiple MZ signatures in the file). The fact that unnecessary zeroes are added at the end of the file (just 3 when sending with Metasploit and nearly 2.5 Mb of them when WannaCry is sending itself) is a separate problem. I don't know yet what is causing it but the SMB packets do contain them.

Unfortunately, I am not sufficiently familiar with the SMB protocol (and the way EternalBlue/DoublePulsar abuse it) to figure out what is happening and, in particular, why everything works fine with a real system but samples are corrupted when Dionaea is emulating it. There ought to be somewhere information where the transferred file actually begins (so that we don't need to scan for the 'MZ' header) and how long it is (so that we truncate the right amount of garbage from the end).

We'll keep investigating.

bontchev commented 5 years ago

No, I am wrong. The problem isn't the structure of WannaCry's EXE file. The problem comes from the way DoublePulsar is used - and I'm starting to think that the problem is not solvable, alas.

You see, when sending a file to a machine on which DoublePulsar is active, you send a shellcode with the file appended to it. DoublePulsar injects one of the system Windows processes with it and the shellcode takes care to do whatever it needs to do with the appended stuff.

Problem is, there is no standard protocol for this. Different attacks could use different shellcodes. That's why Dionaea is looking for the "MZ" header instead of going to the beginning of the EXE file directly - because it can't know in advance how long the shellcode is. I haven't checked, but probably the Metasploit module uses a different shellcode, and I wouldn't be surprised if the NotPetya worm uses a third one.

But, in the case of WannaCry, what follows after the shellcode is not just one EXE file. It is at least two, concatenated together. The first looks like some kind of DLL, the second one is the actual worm. This is too specific. I don't see a generic way of dealing with this, and if we use a WannaCry-specific solution, there is no guarantee that a new attack won't use yet another method/structure of the stuff appended after the shellcode.

We'll keep thinking, folks, but I'm starting to lose hope.

gento commented 5 years ago

Thanks @fe7ch for the heads-up.

Thanks @bontchev for your sharing about the observation. Love your detailed feedbacks!

Just quick context about how the DoublePulsar emulation works: We try to emulate a Windows machine that being infected and active with DoulePulsar implant. Dionaea is designed to understand a few DoublePulsar commands, e.g. 0x23=ping, 0xc8=exec and 0x77=kill.

Upon interacting with Doublepulsar command 'exec' (0xc8 = exec), the incoming traffic will usually start with some sort of shellcode and followed by the designated "payloads". The shellcode will serve its purpose to simply loading the designated "payloads" - the DLL file that attackers want to inject.

The designated "payloads" can be a DLL file with different malware such as WannaCry, Metasploit reverse shell, Trojan/RAT, etc depends on how the attacker likes to use it.

You are correct @bontchev . Dionaea will start to read the traffic (including the shellcode and appended with the "payloads" ). As we not sure about the length of shellcode last year during the WannaCry outbreak, we decided to search directly for the "payloads" by identifying the first "MZ header" and collect the following data stream onto disk, for collection purpose. So this is the creation of the commonly seen 5MB file in Dionaea.

As an example, a bulky 5MB file (MD5 9e8163f86d1d334e0e1dc08ea6d4b78f):

It was collected in Dionaea yesterday. It is a DLL file. Direct execution of DLL file on Windows command terminal will result in the common message - this is not a valid Win32 application. Well, We can run the DLL file with its export named PlayGame, such as command "rundll32.exe ,PlayGame", and this will kick start the infection and dropped smaller different WannaCry loader / worm components e.g. mssecsvc.exe, tasksche.exe. This is the execution result for this bulky 5MB file in public available sandbox (including DNS resolution of the kill-switch domain) :

https://www.hybrid-analysis.com/sample/74b8b9b1ee9f573b1a8589f89410685fe2fcfda34758ffe194c747a741493885?environmentId=110

gento commented 5 years ago

Also, I setup a brand new Dionaea sensor and these are the collection for past 12 hours:

Samples with 5MB file size: 3553aeb71299e94c2549f1b34f6c1a43 5.0M
414a3594e4a822cfb97a4326e185f620 5.0M 59b5090fad3d62f05572470f0c79c9a4 5.0M 95ae8e32eb8635e7eabe14ffbfaa777b 5.0M 9e8163f86d1d334e0e1dc08ea6d4b78f 5.0M a55b9addb2447db1882a3ae995a70151 5.0M ae12bb54af31227017feffd9598a6f5e 5.0M b722bacc798d9fb62975688db86871b3 5.0M ce223b231f2862124386c585e9b95ca1 5.0M cf4f46336abeec03630297f846d17482 5.0M

Samples with non-5MB file size: 8a4e9f688c6d0effd0fa17461352ed3e 70K dede6d1500af444a9f4d67bf9fcc6088 89K ca71f8a79f8ed255bf03679504813c6a 82K

Looks like all these 5MB samples were analyzed previously and running in Hybrid-Analysis sandbox, and triggered the kill-switch domain DNS resolution request. I will consider these are all WannaCry.

Well, from time to time, the initial approach of 'search the payload directly by identifying the first MZ header' return other interesting files. There are always a small amount of non 5MB file size samples, and they are likely the one under the radar and interesting one, e.g. Trojan/RAT/DDoSer/Meterpreter and other evils which delivered via DoublePulsar implant.

Hopefully this is helpful.

Thanks for the feedbacks!

fe7ch commented 5 years ago

There are always a small amount of non 5MB file size samples, and they are likely the one under the radar and interesting one, e.g. Trojan/RAT/DDoSer/Meterpreter and other evils which delivered via DoublePulsar implant.

Yep, last time I checked there were some downloaders, miners and even ransomware (spora) delivered via DoublePulsar.

Fathonizep commented 5 years ago

Hi, @fe7ch I got a problem, I tried to attack dionaea honeypot using metasploit with smb vulnerability. But, the dionaea always brings up the message "[09102018 03:37:01] SMB /dionaea/smb/smb.py:790: Attempt to register ec04e01a-e50f-412c-8f8c-2daa4b572bb9 failed, UUID does not exist or is not implemented" with different UUID and in this attempt I used the lastest version of dionaea. Why i dont get the binaries and i always got this message?

fe7ch commented 5 years ago

@Fathonizep To be honest I have no idea. I'm just a user of dionaea, not a developer/contributor.

Fathonizep commented 5 years ago

@fe7ch Can i know the test scenario that u used for getting the binaries? are u using vps for environtment? because i'm using vm and attack manually.

fe7ch commented 5 years ago

@Fathonizep I have dionaea installed on a server available from the Internet. I don't manually attack it, I'm only monitoring real attacks.

Fathonizep commented 5 years ago

@fe7ch can i have ur email? i got many things to ask.

fe7ch commented 5 years ago

@Fathonizep you can contact me via twitter (the same handle)

Fathonizep commented 5 years ago

@fe7ch Thank's.

bontchev commented 5 years ago

OK, I think I understand now what is happening. Here is a detailed write-up.

TL;DR: Dionaea is doing the best it can, it isn't really corrupting WannaCry, what it does is still not good enough, but the problem cannot be solved adequately.

As I mentioned above, the problem with emulating DoublePulsar is that there is no standard protocol for using it; transferring a file with it relies on the execution of the shellcode preceding that file and which shellcode DoublePulsar injects in a Windows system process to be executed. (The original FuzzBunch injects in lsass.exe, Metasploit injects in spoolsv.exe.)

This shellcode is not something constant and standard, either. WannaCry uses a 3611-byte shellcode; Metasploit uses a 4869-byte shellcode. I can't easily test with NotPetya (the other SMB worm) because it doesn't work in virtual environments, but I'm pretty sure that it uses a third one.

Since the size of the shellcode cannot be known beforehand, Dionaea simply looks for the "MZ" header of the EXE file that follows it. This kinda works but is unreliable. It is perfectly possible for this EXE file to be additonally encoded (e.g., by XOR 0xAA) and have the shellcode decode it before injecting it. Then Dionaea wont find anything.

But why does it seemingly corrupt WannaCry? We all know that the worm is a 3,723,264-byte EXE file. Instead, Dionaea captures a 5,267,459-byte file, which does have an EXE header, but cannot be executed (it produces an error message). Initial inspection showed that there is some additional EXE code before it and about 2 Mb of trailing zeroes at the end. Yet, the worm clearly does replicate on a real system, so why does Dionaea's capturing produce this seemingly corrupted file?

As it turns out, the problem here isn't really Dionaea's method of capturing but the way that WannaCry replicates and the incompleteness of most technical descriptions of it.

You see, WannaCry does not replicate as a 3,723,264-byte EXE file. Instead, what is sent via SMB, is a 5,267,459-byte DLL file. DLL files have an EXE header starting with "MZ" but they are not directly executable; they have to be loaded.

This DLL file has a resource named "W", which contains an EXE file. Well, not exactly. As you can see, there seems to be some "garbage" before the start of the EXE file header in the resource:

clipboard01

In reality, this "garbage" is a DWORD containing the actual length of that EXE file (0x0038D000 == 3,723,264). When the exported function of this DLL is executed, it locates this resource, extracts that many bytes from offset 4 of it, and writes them to the file C:\WINDOWS\mssecsvc.exe:

.text:10001016                         ExtractResource proc near               ; CODE XREF: PlayGame+1Dp
.text:10001016
.text:10001016                         NumberOfBytesWritten= dword ptr -4
.text:10001016
.text:10001016 51                                      push    ecx             ; Save ECX, ESI, & EDI
.text:10001017 56                                      push    esi
.text:10001018 57                                      push    edi
.text:10001019 68 10 30 00 10                          push    offset resourceName ; "W"
.text:1000101E 6A 65                                   push    65h
.text:10001020 FF 35 3C 31 00 10                       push    hModule         ; hModule
.text:10001026 FF 15 18 20 00 10                       call    ds:FindResourceA
.text:1000102C 8B F8                                   mov     edi, eax
.text:1000102E 85 FF                                   test    edi, edi        ; Resource found?
.text:10001030 74 2F                                   jz      short errorExit ; Exit if not
.text:10001032 57                                      push    edi             ; hResInfo
.text:10001033 FF 35 3C 31 00 10                       push    hModule         ; hModule
.text:10001039 FF 15 14 20 00 10                       call    ds:LoadResource ; Load resource in the buffer at EDI
.text:1000103F 85 C0                                   test    eax, eax        ; Loading OK?
.text:10001041 74 1E                                   jz      short errorExit ; Exit if not
.text:10001043 50                                      push    eax             ; hResData
.text:10001044 FF 15 10 20 00 10                       call    ds:LockResource
.text:1000104A 8B F0                                   mov     esi, eax        ; ESI now points at buffer start
.text:1000104C 85 F6                                   test    esi, esi        ; Lock successful?
.text:1000104E 74 11                                   jz      short errorExit ; Exit if not
.text:10001050 57                                      push    edi             ; hResInfo
.text:10001051 FF 35 3C 31 00 10                       push    hModule         ; hModule
.text:10001057 FF 15 0C 20 00 10                       call    ds:SizeofResource
.text:1000105D 85 C0                                   test    eax, eax
.text:1000105F 75 04                                   jnz     short skip
.text:10001061
.text:10001061                         errorExit:                              ; CODE XREF: ExtractResource+1Aj
.text:10001061                                                                 ; ExtractResource+2Bj ...
.text:10001061 33 C0                                   xor     eax, eax
.text:10001063 EB 42                                   jmp     short returnAddress
.text:10001065                         ; ---------------------------------------------------------------------------
.text:10001065
.text:10001065                         skip:                                   ; CODE XREF: ExtractResource+49j
.text:10001065 53                                      push    ebx             ; Save EBX
.text:10001066 8B 1E                                   mov     ebx, [esi]      ; Load 1st DWORD of buffer into BX
.text:10001068 6A 00                                   push    0               ; hTemplateFile
.text:1000106A 6A 04                                   push    4               ; dwFlagsAndAttributes
.text:1000106C 6A 02                                   push    2               ; dwCreationDisposition
.text:1000106E 6A 00                                   push    0               ; lpSecurityAttributes
.text:10001070 6A 02                                   push    2               ; dwShareMode
.text:10001072 68 00 00 00 40                          push    40000000h       ; dwDesiredAccess
.text:10001077 68 38 30 00 10                          push    offset fileName ; lpFileName
.text:1000107C FF 15 08 20 00 10                       call    ds:CreateFileA
.text:10001082 8B F8                                   mov     edi, eax
.text:10001084 83 FF FF                                cmp     edi, 0FFFFFFFFh ; File creation failed?
.text:10001087 74 1A                                   jz      short errorExit2 ; Exit if so
.text:10001089 8D 44 24 0C                             lea     eax, [esp+10h+NumberOfBytesWritten]
.text:1000108D 6A 00                                   push    0               ; lpOverlapped
.text:1000108F 50                                      push    eax             ; lpNumberOfBytesWritten
.text:10001090 83 C6 04                                add     esi, 4          ; Skip first 4 bytes of buffer
.text:10001093 53                                      push    ebx             ; nNumberOfBytesToWrite
.text:10001094 56                                      push    esi             ; lpBuffer
.text:10001095 57                                      push    edi             ; hFile
.text:10001096 FF 15 04 20 00 10                       call    ds:WriteFile
.text:1000109C 57                                      push    edi             ; hObject
.text:1000109D FF 15 00 20 00 10                       call    ds:CloseHandle
.text:100010A3
.text:100010A3                         errorExit2:                             ; CODE XREF: ExtractResource+71j
.text:100010A3 6A 01                                   push    1
.text:100010A5 58                                      pop     eax             ; Set EAX := 1
.text:100010A6 5B                                      pop     ebx             ; Restore EBX
.text:100010A7
.text:100010A7                         returnAddress:                          ; CODE XREF: ExtractResource+4Dj
.text:100010A7 5F                                      pop     edi             ; Restore EDI, ESI, and ECX
.text:100010A8 5E                                      pop     esi
.text:100010A9 59                                      pop     ecx
.text:100010AA C3                                      retn
.text:100010AA                         ExtractResource endp

This file is then launched. This is the worm file that actually ends on the infected PCs (the initial DLL is only in memory), which is why almost everybody's description starts with it. But it isn't what spreads via SMB "as is".

I am going to close this issue now, because the executables aren't really corrupted. They are just DLLs that have to be loaded and their exported funtion - executed, in order to drop the right EXE file.

However, I must re-iterate that Dionaea's emulation of DoublePulsar is rather naive and can be easily bypassed by other malware in a future attack. A proper solution would require emulating the shellcode - but that's too hard to be worth the effort.

gento commented 5 years ago

Thanks @bontchev for your detailed write-up on this! Good stuffs! This will be very good for knowledge and better understanding about the internal mechanism.

Totally agree that the Dionaea's emulation is not perfect. There may be ways to bypass emulation especially the honeypot source code is public available. Challenging for honeypot researchers, well, this is always the hide and seek games.

Thanks again for your feedbacks!