Cisco-Talos / clamav

ClamAV - Documentation is here: https://docs.clamav.net
https://www.clamav.net/
GNU General Public License v2.0
4.24k stars 687 forks source link

False positive Heuristics.Encrypted.RAR for s390 binary #1143

Open candrews opened 8 months ago

candrews commented 8 months ago

ClamAV reports Heuristics.Encrypted.RAR for s390 binaries:

To reproduce this issue using https://www.npmjs.com/package/@esbuild/linux-s390x/v/0.19.11 as an example:

curl -s -L https://registry.npmjs.org/@esbuild/linux-s390x/-/linux-s390x-0.19.11.tgz | tar xzvf - && docker run -v "$(pwd):/scandir:Z" -it clamav/clamav:1.2.1@sha256:d584c29eefc29e138eb14f243abef2f6712cffecac52194626a2b2f6bb3ec2c7 clamscan /scandir/package/bin/esbuild --alert-encrypted=yes --alert-encrypted-archive=yes

Expected result: No infections reported

Actual result:

/scandir/package/bin/esbuild: Heuristics.Encrypted.RAR FOUND

----------- SCAN SUMMARY -----------
Known viruses: 8680421
Engine version: 1.2.1
Scanned directories: 0
Scanned files: 1
Infected files: 1
Data scanned: 10.18 MB
Data read: 9.56 MB (ratio 1.06:1)
Time: 16.305 sec (0 m 16 s)
Start Date: 2024:01:16 20:28:58
End Date:   2024:01:16 20:29:14
candrews commented 8 months ago

I also tested with clamav/clamav:unstable and got the same result.

micahsnyder commented 8 months ago

@candrews do you know of other files that have this false positive or just the esbuild file?

The log output from clamscan with --gen-json --debug has:

LibClamAV debug: ELF: ELF class 2 (64-bit)
LibClamAV debug: ELF: Number of program headers: 7
LibClamAV debug: ELF: Number of sections: 14
LibClamAV debug: Matched signature for file type HTML data
LibClamAV debug: Matched signature for file type ZIP-SFX at 9699976
LibClamAV debug: Matched signature for file type RAR-SFX at 9700376
LibClamAV debug: hashtab: Freeing hashset, elements: 0, capacity: 0
LibClamAV debug: CL_TYPE_ZIPSFX signature found at 9699976
LibClamAV debug: in cli_unzip_single
LibClamAV debug: cli_basename: Provided path does not include a file name.
LibClamAV debug: cli_unzip: local header - ZMDNAME:1::4294903808:65535:ffff0a0d:29798:0:1
LibClamAV debug: CDBNAME:CL_TYPE_ZIP:65535::65535:4294903808:1:0:4294904333:(nil)
LibClamAV debug: cli_unzip: local header - header has got unusable masked data
LibClamAV debug: CL_TYPE_RARSFX signature found at 9700376
LibClamAV debug: fmap_dump_to_file: dumping fmap not backed by file...
LibClamAV debug: in scanrar()
unrar_open: Failed to open archive: /tmp/20240116_155709-scantemp.746e62b7d5/clamav-fd3583e4045bb5b8eb2db99cab373e1e.tmp
unrar_retcode: Encrypted file header found in archive.
LibClamAV debug: RAR: Encrypted main header
LibClamAV debug: FP SIGNATURE: 65cdddfc6d5eb9462c8a8779bfcea790:326632:Heuristics.Encrypted.RAR  # Name: n/a, Type: CL_TYPE_RAR
LibClamAV debug: FP SIGNATURE: 74e7b2d64832250115f6e0b94996ebe0:10027008:Heuristics.Encrypted.RAR  # Name: esbuild, Type: CL_TYPE_ELF
LibClamAV debug: RAR: Exit code: 0
LibClamAV debug: Descriptor[3]: Continuing after file scan resulted with: No viruses detected
LibClamAV debug: Running bytecode hook
LibClamAV debug: Bytecode executing hook id 261 (0 hooks)
LibClamAV debug: Bytecode: no logical signature matched, no bytecode executed
LibClamAV debug: Finished running bytecode hook
LibClamAV debug: Descriptor[3]: Continuing after file scan resulted with: No viruses detected
LibClamAV debug: cli_magic_scan: returning 0  at line 5037
LibClamAV debug: {
  "Magic":"CLAMJSONv0",
  "RootFileType":"CL_TYPE_ELF",
  "FileName":"esbuild",
  "FileType":"CL_TYPE_ELF",
  "FileSize":10027008,
  "FileMD5":"74e7b2d64832250115f6e0b94996ebe0",
  "EmbeddedObjects":[
    {
      "FileType":"CL_TYPE_ZIPSFX",
      "Offset":9699976
    },
    {
      "FileType":"CL_TYPE_RARSFX",
      "Offset":9700376,
      "Viruses":[
        "Heuristics.Encrypted.RAR"
      ]
    }
  ]
}

Clamscan thinks it found a ZIP file and a RAR at offset 9700376 in the file. Parsing the ZIP failed (probably a false positive because of the failed extraction).

But when parsing the RAR stopped when the sequence looked like it is encrypted.

When I pop open a hex editor an look at that offset, I found this: image

It appears this file contains the magic bytes for the start portions of a bunch of different file types and HTML tags including but not limited to:

In short, I don't believe this is a false positive -- this file is bundling at least part of an encrypted RAR archive.

candrews commented 8 months ago

I reported this finding to the esbuild project, here's their response: https://github.com/evanw/esbuild/issues/3599#issuecomment-1894585562

It appears that the Go standard library contains these bytes at https://github.com/golang/go/blob/b44f6378233ada888f0dc79e0ac56def4673d9ed/src/net/http/sniff.go#L183-L190 which is what's being picked up by ClamAV here. I feel like this means that ClamAV will probably pick up Go binaries often.

Perhaps ClamAV's heuristic should be adjusted?

micahsnyder commented 7 months ago

Golang chose to embed a (very tiny) RAR archive in Go software and ClamAV is correctly identifying it. You've chosen to make clamav alert if it finds encrypted archives. This is not standard behavior and is not an indication of malware.

That said, I can appreciate that you would want to use the "alert if encrypted" feature, and wouldn't want it to alert on all Golang binaries.

ClamAV relies on libunrar (made by RARLabs) to determine if the archive is encrypted.

I imagine we could find a way to differentiate Golang EXE's from other EXE's and then ignore RAR archives attached to Golang EXE's. Then again, that opens the door to malware hiding stuff in RAR's attached to Golang EXE's. So... I don't really want to do this. Or perhaps we should just disable the "alert if encrypted" feature specifically when scanning Golang EXE's? Something to think about.

This isn't a particularly high priority to me. I don't plan to work on it any time soon, or to ask someone else from my team if they can work on it. If someone from the community wants to come up with a good solution, feel free to spitball some ideas and/or submit some possible solutions as pull requests.

fawind commented 1 month ago

I'm running into the same issue here https://github.com/Cisco-Talos/clamav/issues/1147#issuecomment-2262683054, where ClamAV also identifies the RAR header in the Go binary but then reads incorrect archive size headers causing a MaxFileSize warning.

Golang chose to embed a (very tiny) RAR archive in Go software and ClamAV is correctly identifying it.

I don't know the RAR archive format well, but I don't think I would consider this an actual archive embedded in Go software? The binaries contain the main RAR signature, but none of the other RAR headers that would make it an archive.

I'm wondering if we can improve ClamAV's archive detection mechanism here by checking for additional RAR header fields (e.g. checking if the HEAD_TYPE field is present)? But aware that I'm likely missing nuances here!

Currently, any golang binary compiled with this part of the standard library might run into "Encrypted" or "MaxFileSize" warnings based on whatever bytes come after that signature offset. It's hard for us to generally ignore the MaxFileSize warning for golang binaries given there might be binaries that are >2GB and not scannable by ClamAV.