chainguard-dev / malcontent

a new take on #malware #detection
Apache License 2.0
423 stars 28 forks source link

diff broken: considers two files or directories as delete+add rather than modify #565

Closed tstromberg closed 12 hours ago

tstromberg commented 3 days ago

I'm not sure when this happened (some time before v1.0.0), but I noticed that mal diff is basically broken. Given two directories:

/tmp/old
/tmp/old/lottie-player.min.js
/tmp/new
/tmp/new/lottie-player.min.js

If I run mal diff /tmp/old /tmp/new, it sees it as one file deleted and another one added, rather than a single file that changed:

% m diff /tmp/old /tmp/new                                                                                                                                    

Deleted: ../../private/tmp/old/lottie-player.min.js [⚠️ MEDIUM]
-------------------------------------------------------------------------------------------------------------------------------------
RISK  KEY                             DESCRIPTION                             EVIDENCE
-------------------------------------------------------------------------------------------------------------------------------------
-LOW  data/encoding/json/decode       Decodes JSON messages                   JSON.parse
-LOW  data/encoding/json/encode       encodes JSON                            JSON.stringify
-LOW  impact/words/plugin             references a 'plugin'                   function installPlugin
                                                                              getExpressionsPlugin
                                                                              plugins
                                                                              return expressionsPlugin
                                                                              setExpressionsPlugin
-LOW  net/url/embedded                contains embedded HTTPS URLs            https://www.jsdelivr.com/using-sri-with-dynamic-files
-LOW  net/url/parse                   Handles URL strings                     new URL
-MED  exec/remote_commands/code_eval  evaluate code dynamically using eval()  eval("
-MED  net/download/download           download files                          download_
-MED  os/time/clock/sleep             uses setInterval to wait                setInterval(
-------------------------------------------------------------------------------------------------------------------------------------

Added: ../../private/tmp/new/lottie-player.min.js [🚨 CRITICAL]
-----------------------------------------------------------------------------------------------------------------------------------------
RISK   KEY                               DESCRIPTION                                            EVIDENCE
-----------------------------------------------------------------------------------------------------------------------------------------
+LOW   c2/addr/url/unusual               Contains HTTP hostname with unusual top-level domain   https://api.mantlescan.xyz/
                                                                                                https://mantlescan.xyz/
                                                                                                https://openchain.xyz/
+LOW   credential/ssl/private_key        References private keys                                privateKey
+LOW   crypto/aes                        Supports AES (Advanced Encryption Standard)            AES
+LOW   crypto/ed25519                    Elliptic curve algorithm used by TLS and SSH           ed25519
+LOW   data/encoding/base64              Supports base64 encoded strings                        base64
+LOW   data/encoding/json/decode         Decodes JSON messages                                  JSON.parse
+LOW   data/encoding/json/encode         encodes JSON                                           JSON.stringify
+LOW   fs/file/open                      opens files                                            open(
+LOW   fs/mount                          mounts file systems                                    -o
                                                                                                mount
+LOW   impact/words/password             references a 'password'                                PasswordBasedCipher
                                                                                                to countless passwords
+LOW   impact/words/plugin               references a 'plugin'                                  plugin_relativeTime
                                                                                                plugin_updateLocale
                                                                                                plugins
+LOW   net/resolve/hostport/parse        Network address and service translation                getaddrinfo
+LOW   net/socket/socket/listen          listen on a socket                                     accept
                                                                                                socket
+LOW   net/socket/socket/send            send a message to a socket                             _send
+LOW   net/url/embedded                  contains embedded HTTPS URLs                           https://abitype.dev
                                                                                                https://andromeda-explorer.metis.io/api
                                                                                                https://andromeda.metis.io/?owner=1088
                                                                                                https://api-era.zksync.network/api
                                                                                                https://api-moonbeam.moonscan.io/api
                                                                                                https://api-moonriver.moonscan.io/api
                                                                                                https://api-optimistic.etherscan.io/api
                                                                                                https://api-zkevm.polygonscan.com/api
                                                                                                …
+LOW   net/url/parse                     Handles URL strings                                    new URL
+LOW   os/env/get                        Retrieve environment variable values                   env.DEBUG
                                                                                                env.MODE
                                                                                                env.NEXT
                                                                                                env.NODE
+LOW   os/fd/read                        reads from a file handle                               e.read()
+LOW   os/fd/write                       writes to a file handle                                a.write(o)
                                                                                                decoder.write(n)
                                                                                                decoder.write(t)
                                                                                                e.write(t)
                                                                                                i.write(e)
                                                                                                t.write(o)
                                                                                                this.write(e)
+MED   anti/static/obfuscation/generic/  converts hex data to ASCII                             toString("hex");
       hex_conversion
+MED   c2/addr/ip                        hardcoded IP address                                   114.243.154.69
                                                                                                13.182.181.343
                                                                                                13.23.32.42
                                                                                                14.22.33.243
                                                                                                14.52.54.92
                                                                                                146.288.257.686
                                                                                                15.15.34.34
                                                                                                15.21.28.36
                                                                                                …
+MED   credential/keychain/keychain      May access the macOS keychain                          keychain
+MED   data/embedded/embedded/base64/    Contains base64 url                                    odHRwOi8v::$http
       url
+MED   discover/system/platform          get system identification                              process.platform
                                                                                                process.versions
+MED   exec/remote_commands/code_eval    evaluate code dynamically using exec()                 exec(e))return
                                                                                                exec(e),e
                                                                                                exec(h)
                                                                                                exec(l),null
                                                                                                exec(o))
                                                                                                exec(r))
                                                                                                exec(t)
+MED   exfil/stealer/browser             Uses HTTP, archives, and references multiple browsers  .config
                                                                                                Brave
                                                                                                Chrome
                                                                                                Discord
                                                                                                Firefox
                                                                                                Opera
                                                                                                POST
                                                                                                Safari
                                                                                                …
+MED   fs/path/relative                  references and possibly executes relative path         ./aes
                                                                                                ./blowfish
                                                                                                ./cipher-core
                                                                                                ./core
                                                                                                ./evpkdf
                                                                                                ./format-hex
                                                                                                ./hmac
                                                                                                ./lib-typedarrays
                                                                                                …
+MED   impact/words/agent                references an 'agent'                                  useragent
+MED   impact/words/heartbeat            references a 'heartbeat'                               heartBeatTimeout
                                                                                                heartbeat_pulse
                                                                                                lastHeartbeatResponse
                                                                                                updateLastHeartbeat
+MED   net/download/download             download files                                         Downloads
                                                                                                downloads-view
                                                                                                mobile-download-links
+MED   net/http/http/form/upload         upload content via HTTP form                           POST
                                                                                                application/json
                                                                                                application/x-www-form-urlencoded
+MED   net/http/http/post                submits content to websites                            Content-Type
                                                                                                HTTP
                                                                                                POST
                                                                                                http
+MED   net/http/websocket                supports web sockets                                   WalletLinkWebSocket
                                                                                                WebSocket:gV
                                                                                                WebSocket:typeof
                                                                                                WebSocketClass:h
                                                                                                WebSocketClass:l
                                                                                                clearWebSocket
                                                                                                webSocket:e
                                                                                                webSocket:r
                                                                                                …
+MED   net/url/encode                    encodes URL, likely to pass GET variables              urlencode
+MED   net/url/request                   requests resources via URL                             requests.get(e)
+CRIT  exfil/stealer/wallet              makes HTTPS connections and references multiple        BraveWallet
                                         wallets by name                                        Coinbas
                                                                                                Ronin
                                                                                                http
-----------------------------------------------------------------------------------------------------------------------------------------

The same happens if I use specify the files by path name:

mal diff /tmp/old/lottie-player.min.js /tmp/new/lottie-player.min.js                                                                                                       
Deleted: ../../../private/tmp/old/lottie-player.min.js [⚠️ MEDIUM]
-------------------------------------------------------------------------------------------------------------------------------------
RISK  KEY                             DESCRIPTION                             EVIDENCE
-------------------------------------------------------------------------------------------------------------------------------------
-LOW  data/encoding/json/decode       Decodes JSON messages                   JSON.parse
-LOW  data/encoding/json/encode       encodes JSON                            JSON.stringify
-LOW  impact/words/plugin             references a 'plugin'                   function installPlugin
                                                                              getExpressionsPlugin
                                                                              plugins
                                                                              return expressionsPlugin
                                                                              setExpressionsPlugin
-LOW  net/url/embedded                contains embedded HTTPS URLs            https://www.jsdelivr.com/using-sri-with-dynamic-files
-LOW  net/url/parse                   Handles URL strings                     new URL
-MED  exec/remote_commands/code_eval  evaluate code dynamically using eval()  eval("
-MED  net/download/download           download files                          download_
-MED  os/time/clock/sleep             uses setInterval to wait                setInterval(
-------------------------------------------------------------------------------------------------------------------------------------

Added: ../../../private/tmp/new/lottie-player.min.js [🚨 CRITICAL]
--------------------------------------------------------------------------

Here is the output of v0.10.0, showing the expected behavior (except that the filename is "."):

go run . --diff /tmp/old/lottie-player.min.js /tmp/new/lottie-player.min.js
Changed: . [⚠️ MEDIUM → 🚨 CRITICAL]

+++ ADDED: 24 behavior(s) +++
----------------------------------------------------------------------------------------------------------------------------
RISK   KEY                       DESCRIPTION                                            EVIDENCE
----------------------------------------------------------------------------------------------------------------------------
+LOW   crypto/aes                Supports AES (Advanced Encryption Standard)            AES
+LOW   crypto/ed25519            Elliptic curve algorithm used by TLS and SSH           ed25519
+LOW   encoding/base64           Supports base64 encoded strings                        base64
+LOW   env/get                   Retrieve environment variable values                   env.DEBUG
                                                                                        env.MODE
                                                                                        env.NEXT
                                                                                        env.NODE
+LOW   fs/mount                  mounts file systems                                    -o
                                                                                        mount
+LOW   net/hostport/parse        Network address and service translation                getaddrinfo
+LOW   net/socket/listen         listen on a socket                                     accept
                                                                                        socket
+LOW   net/socket/send           send a message to a socket                             _send
+LOW   ref/site/url/unusual      Contains HTTP hostname with unusual top-level domain   https://api.mantlescan.xyz/
                                                                                        https://mantlescan.xyz/
                                                                                        https://openchain.xyz/
+LOW   ref/words/password        references a 'password'                                PasswordBasedCipher
                                                                                        to countless passwords
+LOW   secrets/private_key       References private keys                                privateKey
+MED   combo/stealer/browser     Uses HTTP, archives, and references multiple browsers  .config
                                                                                        Brave
                                                                                        Chrome
                                                                                        Firefox
                                                                                        POST
                                                                                        Safari
                                                                                        http
                                                                                        zip
                                                                                        …
+MED   data/embedded/base64/url  Contains base64 url                                    odHRwOi8v::$http
+MED   kernel/uname/get          get system identification                              process.platform
                                                                                        process.versions
+MED   net/http/form/upload      upload content via HTTP form                           "application/x-www-form-urlencoded
+MED   net/http/post             Able to submit content via HTTP POST                   HTTP
                                                                                        POST
                                                                                        http
+MED   net/url/encode            encodes URL, likely to pass GET variables              urlencode
+MED   net/url/request           requests resources via URL                             requests.get(e)
+MED   ref/ip                    hardcoded IP address                                   114.243.154.69
                                                                                        13.182.181.343
                                                                                        13.23.32.42
                                                                                        14.22.33.243
                                                                                        14.52.54.92
                                                                                        146.288.257.686
                                                                                        15.15.34.34
                                                                                        15.21.28.36
                                                                                        …
+MED   ref/path/relative         references and possibly executes relative path         ./aes
                                                                                        ./blowfish
                                                                                        ./cipher-core
                                                                                        ./core
                                                                                        ./evpkdf
                                                                                        ./format-hex
                                                                                        ./hmac
                                                                                        ./lib-typedarrays
                                                                                        …
+MED   ref/words/agent           references an 'agent'                                  useragent
+MED   secrets/keychain          May access the macOS keychain                          keychain
+HIGH  ref/site/unusual          unusual http hostname                                  https://api.mantlescan.xyz/
                                                                                        https://mantlescan.xyz/
                                                                                        https://openchain.xyz/
+CRIT  combo/stealer/wallet      makes HTTPS connections and references multiple        BraveWallet
                                 wallets                                                Coinbas
                                                                                        Ronin
                                                                                        http
----------------------------------------------------------------------------------------------------------------------------

However we fix this, we need to add a test as our diff code is really difficult to understand and fragile.

tstromberg commented 3 days ago

@egibs - any chance you can help with this? I'm confident you can fix this far better and faster than I can.

tstromberg commented 3 days ago

It looks like there is at least one example where diff get's things right:


m diff ../bincapz-samples/linux/clean/ls.x86_64 ../bincapz-samples/macOS/clean/ls                                                                            695ms  Sat Nov  2 10:45:37 2024
Changed: ../bincapz-samples/macOS/clean/ls [⚠️ MEDIUM → ✅ LOW]

+++ ADDED: 1 behavior(s) +++
---------------------------------------------------------------------------
RISK  KEY                    DESCRIPTION                    EVIDENCE
---------------------------------------------------------------------------
+LOW  fs/directory/traverse  traverse filesystem hierarchy  _fts_children
                                                            _fts_close
                                                            _fts_open
                                                            _fts_read
                                                            _fts_set
---------------------------------------------------------------------------

--- REMOVED: 3 behavior(s) ---
-------------------------------------------------------------------------------------------------------------------------------
RISK  KEY                           DESCRIPTION                          EVIDENCE
-------------------------------------------------------------------------------------------------------------------------------
-LOW  discover/system/hostname/get  get computer host name               gethostname
-LOW  net/url/embedded              contains embedded HTTPS URLs         https://gnu.org/licenses/gpl.html
                                                                         https://translationproject.org/team/
                                                                         https://wiki.xiph.org/MIME_Types_and_File_Extensions
                                                                         https://www.gnu.org/software/coreutils/
-MED  process/name/set              get or set the current process name  __progname
-------------------------------------------------------------------------------------------------------------------------------
tstromberg commented 2 days ago

Some weirdness: if I use a relative path, diff works:

% cd /tmp
% mal diff old new
├─ 🛑 Changed: new/lottie-player.min.js [MEDIUM → CRITICAL]
│     ▲ anti-static [NONE → MEDIUM]
++       🟡 obfuscation/generic/hex_conversion — converts hex data to ASCII: toString("hex");
│     ▲ command & control [NONE → MEDIUM]
++       🟡 addr/ip — hardcoded IP address:
++           114.243.154.69, 13.182.181.343, 13.23.32.42, 14.22.33.243, 14.52.54.92, 146.288.257.686, 15.15.34.34, 15.21.28.36, …

If I specify absolute paths, it reverts to the deleted+added bug:

% cd /tmp
% mal diff /tmp/old /tmp/new
├─ 🟡 Deleted: ../../private/tmp/old/lottie-player.min.js [MEDIUM]
│     ≡ data [LOW]
│       🟢 encoding/json_decode — Decodes JSON messages: JSON.parse
│       🟢 encoding/json_encode — encodes JSON: JSON.stringify
│     ≡ execution [MEDIUM]
│       🟢 plugin — references a 'plugin':
│           function installPlugin, getExpressionsPlugin, plugins, return expressionsPlugin, setExpressionsPlugin
│       🟡 remote_commands/code_eval — evaluate code dynamically using eval(): eval("
│     ≡ networking [MEDIUM]
│       🟡 download — download files: download_
│       🟢 url/embedded — contains embedded HTTPS URLs: https://www.jsdelivr.com/using-sri-with-dynamic-files
│       🟢 url/parse — Handles URL strings: new URL
│     ≡ operating-system [MEDIUM]
│       🟡 time/clock_sleep — uses setInterval to wait: setInterval(
│
├─ 🛑 Added: ../../private/tmp/new/lottie-player.min.js [CRITICAL]
egibs commented 2 days ago

Interesting. I'll look into this first thing tomorrow.