alexfru / SmallerC

Simple C compiler
BSD 2-Clause "Simplified" License
1.4k stars 156 forks source link

Virus/trojan false positives: SMLRC has many false positives on Windows #26

Open ghost opened 5 years ago

ghost commented 5 years ago

https://www.virustotal.com/gui/file/e96b3b77fe77d643ba8c3eea4f1a1b35ffe532cab6bfd65bb4b9b75264640554/detection

alexfru commented 5 years ago

How do you fix stupid?

ghost commented 5 years ago

Ouch!

alexfru commented 5 years ago

Right, stupid antiviri. I don't know how to fix them.

ghost commented 5 years ago

Let's keep this issue open and see if anyone knows how.

ghost commented 5 years ago

Anyways, did you mean to insult me on purpose?

alexfru commented 5 years ago

@Shineonline I'm very sorry, stupid was not directed at you at all. Stupid is how those AVs are made and work.

ghost commented 5 years ago

It's OK, I take mild insults like paper balls to the chest anyways

alexfru commented 5 years ago

So, here's the problem as I understand it...

Figuring out what a program does or if it's somehow malicious is a problem that does not have a cheap solution (it's NP complete or some such). It's generally impossible to tell if a program even terminates or loops forever (see the halting problem).

Instead those AVs rely more on finding code/data patterns that have already been seen in malicious or infected software. This is a much cheaper "solution". Those (sub)patterns, however, can be benign, hence false positives and the quotation marks. The thinking goes, as long as popular software like Windows itself isn't affected by false positives, all is fine. While this is helpful on the large scale of things, this approach can't work satisfactorily with arbitrary and never before seen software.

I have played with the PE file structure a bit, but so far haven't found a reliable way to generate executables without false positives per virustotal. Even the simplest hello-world-like programs written in assembly (e.g. FASM demos) are flagged for no good reason.

Unless someone figures out the precise pattern that the virustotal minions dislike, this issue isn't going to be resolved. Also, there may be something in the generated code and not the file format/structure. I'm definitely not going to write a vastly different code generator just to fix that.

alexfru commented 5 years ago

There's one clue, the "Rich header", which I haven't tried. OTOH, not every compiler generates it, since it's officially undocumented nor is functionally required.

alexfru commented 4 years ago

With the latest changes in .EXE generation there are some notable improvements:

Before improvements, as of commit 4f1ab17b313f415f4bdc374f26e308751ed8dfb6, false positives from virustotal.com: n2f.exe 12/66 smlrc.exe 34/67 smlrl.exe 24/69 smlrpp.exe 28/71 smlrcc.exe 24/67

After improvements, as of commit 177650342af56c873155817e48d3385672931b90, false positives from virustotal.com: n2f.exe 16/69 smlrc.exe 8/66 smlrl.exe 6/67 smlrpp.exe 23/69 smlrcc.exe 7/67

ghost commented 4 years ago

You should still keep the issue open, but that's an improvement

alexfru commented 3 years ago

virustotal seems to be fncking with us. At first, after another round of improvements in PE generation, it shows a good low number of false positives, then, 3 days later it magically finds another 10-15 false positives that it didn't show the first time. WTF.

o0101 commented 5 months ago

I think the idea is then to not make these online virus counters a metric of quality for your product. Ignore them? :)

Develop independently of them seems a good idea. As you first said, "How do you fix stupid?"

alexfru commented 5 months ago

The problem is that based on these false positives, other software doesn't let one download and use the binaries. Also some people probably shy away from the binaries because they can't know any better and can only trust the AVs.

o0101 commented 5 months ago

I know, but then it's just like "propagation of stupid". You could create a parallel track. Otherwise, your development could be off-tracked by trying to meet these spurious ever changing criteria.

I just think it's the wrong thing to optimize. Obviously it's not my project, it's up to you. Just wanted to contribute my attempt to help - the fear is that you'll start down this road and you'll never see the end of it.

Anyway, I like what you're doing with smallerc i hope it succeeds!!! :)