Neo23x0 / munin

Online hash checker for Virustotal and other services
Apache License 2.0
810 stars 147 forks source link

Munin incorrectly writes semicolon-delimited files instead of CSVs #55

Closed graememeyer closed 9 months ago

graememeyer commented 2 years ago

It appears the latest version of munin is writing output files delimited by semicolon characters (;) rather than commas (,), even when the -o option is applied.

Example:

PS > python .\munin\munin.py -f .\munin\munin-demo.txt -o test.csv
   _________   _    _   ______  _____  ______
  | | | | | \ | |  | | | |  \ \  | |  | |  \ \     /.)
  | | | | | | | |  | | | |  | |  | |  | |  | |    /)\|
  |_| |_| |_| \_|__|_| |_|  |_| _|_|_ |_|  |_|   // /
                                                /'" "
  Online Hash Checker for Virustotal and Other Services
  Florian Roth - 0.21.0 June 2021

[+] 51611 cache entries read from cache database: vt-hash-db.json
[+] You can interrupt the process by pressing CTRL+C without losing the already gathered information
[+] Writing results to new file: test.csv
[+] Processing 22 lines ...

 1 / 22 > Clean
HASH: 1093B3F7D016C0E03CD0DB36D74BA09673A7BB03 COMMENT: bravo.wav
TYPE: WAV SIZE: 7.4 KB FILENAMES: bravo.wav, kogesrtg9.dll, 1s2rwn5t7.dll, file-5582314_wav
FIRST: 2013-06-12 17:27:52 LAST: 2016-07-28 16:50:34 SUBMISSIONS: 2 REPUTATION: 0
COMMENTS: 0 USERS: - TAGS: WAV KNOWN-DISTRIBUTOR
RESULT: 0 / 54

...

 22 / 22 > Clean
HASH: 61b6f3b3407dad1e10ee80684e945e28d21adbeec002548bcaba9a3bc6ffd244 COMMENT: EXE_Susp_Cmds /subfile
TYPE: Win32 EXE SIZE: 9.1 MB FILENAMES: MaypleHD Player, MaypleMp4Installer.exe, MaypleMp4Installer-5.2.0.2.exe
SIGNER: (); Thawte Code Signing CA - G2; thawte COPYRIGHT: Yozii Inc. All rights reserved. DESCRIPTION: MaypleHD Player Install Program
FIRST: 2016-10-19 08:03:48 LAST: 2018-02-27 08:02:11 SUBMISSIONS: 6 REPUTATION: -48
COMMENTS: 1 USERS: dviz TAGS: PEEXE OVERLAY REVOKED-CERT SIGNED NSIS INVALID-SIGNATURE
RESULT: 0 / 67

[+] Results written to file test.csv

[+] Saving 51633 cache entries to file vt-hash-db.json

Output:

PS > Get-Content .\test.csv -First 3
Lookup Hash;Rating;Comment;Positives;Virus;File Names;First Submitted;Last Submitted;File Type;MD5;SHA1;SHA256;Imphash;Matching Rule;Harmless;Revoked;Expired;Trusted;Signed;Signer;Hybrid Analysis Sample;MalShare Sample;VirusBay Sample;MISP;MISP Events;URLhaus;AnyRun;CAPE;VALHALLA;User Comments;Microsoft;Kaspersky;McAfee;CrowdStrike;TrendMicro;ESET-NOD32;Symantec;F-Secure;Sophos;GData;
1093B3F7D016C0E03CD0DB36D74BA09673A7BB03;clean;bravo.wav;0;-;bravo.wav, kogesrtg9.dll, 1s2rwn5t7.dll, file-5582314_wav;2013-06-12 17:27:52;2016-07-28 16:50:34;WAV;deb660600362263bf2cbd8975d23f3c5;1093b3f7d016c0e03cd0db36d74ba09673a7bb03;8dc215954c3f54574aacaa26981e26dfcf4c03de65bbd4bc9e37eb3265289087;-;False;False;False;False;False;False;-;False;False;False;False;;False;False;False;[];['-'];-;-;-;-;-;-;-;-;-;-;
13AEF2CCC4E45B7B8F440F0FDB7D3FBC;clean;ttf;0;-;LinBiolinum_Rah.ttf;2013-10-20 06:23:10;2018-12-24 08:40:26;TrueType Font;13aef2ccc4e45b7b8f440f0fdb7d3fbc;73119c2f63274fd0825c53ec639511ae2f1601ce;f7140084369db686c71e522f0e8de148f0f3f429310376d5f52325a9f0955ba5;-;False;False;False;False;False;False;-;False;False;False;False;;False;False;False;[];['-'];-;-;-;-;-;-;-;-;-;-;

I feel like this issue is too obvious to have gone unnoticed, so perhaps it's intentional? If so, the documentation should be updated to reflect this, and ideally an actual CSV option added. I am happy to contribute this if you can confirm my findings and the intentionality of the issue.

Neo23x0 commented 2 years ago

It's a semicolon - as intended. The README doesn't say that the CSV uses comma as a separator.

Screenshot 2022-02-26 163052

You could add an option that allows a user to define a separator of his choice.

graememeyer commented 2 years ago

@Neo23x0 I've been playing around with the code a bit - is it intended functionality to end all lines with the delimiter? (I'm not aware that that's common CSV definition either).

Currently the header line for example comes out:

Lookup Hash;Rating;Comment;Positives;Virus;File Names;First Submitted;Last Submitted;File Type;MD5;SHA1;SHA256;Imphash;Matching Rule;Harmless;Revoked;Expired;Trusted;Signed;Signer;Hybrid Analysis Sample;MalShare Sample;VirusBay Sample;MISP;MISP Events;URLhaus;AnyRun;CAPE;VALHALLA;User Comments;Microsoft;Kaspersky;McAfee;CrowdStrike;TrendMicro;ESET-NOD32;Symantec;F-Secure;Sophos;GData;

Edit: I ask because I would "fix" this in a PR that adds CSV functionality (where the C=comma), but if it's intended functionality, that would potentially break some existing parsers.

I'm also considering adding an option for quoted CSVs (Excel compatible with character escaping) - is this something you're interested in?

Edit 2: What I'm thinking is:

anotherbridge commented 1 year ago

@graememeyer @Neo23x0 Due to the delimiter being a semicolon the CSV is not valid, because some of the column values contain semicola themselves, thus giving pretending to be more columns than there should be.

I fixed this in PR #66. Since PR #59 is also tackling this problem, I did not quote the column.

Further, in my assumption the trailing delimiters arise due the fact that each element is added with a dedicated write to the file. Since then you don't know which one is the last element added you have to add the ; to avoid any issues.
I handled this by adding each column as a string that is appended to a list containing the content of one line. Instead of writing each element by itself, in the end the list is joined with the delimiter and then written to the file.