ecraven / r7rs-benchmarks

Benchmarks for various Scheme implementations. Taken with kind permission from the Larceny project, based on the Gabriel and Gambit benchmarks.
270 stars 32 forks source link

Avoid unsafe optimizations #43

Closed wingo closed 3 years ago

wingo commented 5 years ago

In PR #15, a change was made to ensure all implementations were run in "safe mode". However this was reverted in b196000a0498655db058dbb6aad551849b56d094. Currently the benchmarks compare safe and unsafe implementations. What's the goal here?

My expectation would be that all Schemes should be compiled in such a way that they don't use unsafe optimizations.

wingo commented 5 years ago

If you agree, the following changes would be made: change Chez to optimization level 2 (the default, I think); consider if -O6 for bigloo is safe; remove (declare (not safe)) from gerbil; remove the (not safe) from the Gambit prelude.

bjoli commented 5 years ago

This is good point. We don't want a race to the bottom towards "which scheme allows you to write C".

vyzo commented 4 years ago

I think you are missing the point of unsafe optimizations. You want to measure performance with how a Scheme is used in production, not some ideal standard of "safety". And it is quite typical in gerbil and gambit to run production code compiled with (declare (not safe)).

wingo commented 4 years ago

Just goes to show that not all users want the same thing :)

To me it's clear that these benchmarks are becoming not useful from a Guile point of view. When the perf difference was greater, they were useful. But with the upcoming 3.0 release they are smaller and where they differ, I want to know what the maximum possible speed is, but not comparing against unsafe optimizations as that's simply a non-goal of Guile; I am not interested in comparing against an implementation that will having wrapping overflow for fixnums, for example.

So, it can be fine if these benchmarks keep doing whatever they want. Guile will run their own against the high-performance Schemes, compiled without using unsafe optimizations. I thought the goal of these benchmarks matched Guile's goals but it would seem it's not the case, and that's OK!

belmarca commented 4 years ago

Allow me to suggest a possible resolution:

Let each implementer tune their software however they want. If that means using unsafe declarations, so be it. If the implementer wants to keep defaults, so be it.

The interest of the benchmarks is not to show "which scheme is fastest". With enough dedication, any program can be optimized, including the compilers. So "who is the fastest" can reflect which compiler writer made the best design choices, or is the smartest, or even has the most resources. In the end, this can become a perpetual "arms race".

What users should get out of benchmarks is a sense of what kind of performance can be expected when writing idiomatic code for the specific problems measured by the benchmarks.

If one implementer chooses to make arcane incantations to deliver insane performance, while producing completely unreliable and unreadable code, that is their problem. And if another wants to promote safety, all the better.

I think one way to solve this issue is to present the data differently in such a way that makes implementers' choices more obvious. How exactly to do that, I don't know.

belmarca commented 4 years ago

@wingo btw I don't think these benchmarks should be "useful from a Guile point of view", or for any other Scheme for that matter. That is not at all how I see them. They should be useful from a users' point of view, in the very limited context of the actual programs tested, with all the caveats that apply.

kunabi commented 4 years ago

We should also disable JIT on those implementations that use them unfairly. According to this they are unsafe! https://wingolog.org/archives/2011/06/21/security-implications-of-jit-compilation

wingo commented 4 years ago

Benchmarks shouldn't be useful to scheme implementors? That's a unique take :)

vyzo commented 4 years ago

They are very useful for scheme implementors indeed, but they are also useful for users.

wingo commented 4 years ago

I have never met a user that wanted unsafe optimizations. I don't doubt they exist, but it just goes to show that not everyone wants the same thing :)

bjoli commented 4 years ago

I, as a user, want my software to fail gracefully if I somehow manage to not validate the constraints of whatever operations I am doing. If I have well tested code that cannot be improved to run fast with safe ops, the option of using unsafe ops is good, but I would never use it as a default

As it is now, we are comparing implementations that validate constraints and promise to never leave the program in an invalid state to implementations that do not. If anything that gives users less information.

-- Linus Björnstam

On Tue, 12 Nov 2019, at 18:52, Marc-André Bélanger wrote:

@wingo https://github.com/wingo btw I don't think these benchmarks should be "useful from a Guile point of view", or for any other Scheme for that matter. That is not at all how I see them. They should be useful from a users' point of view, in the very limited context of the actual programs tested, with all the caveats that apply.

vyzo commented 4 years ago

Note that you don't have to use unsafe forms for the entire program, and especially during testing. You can apply (declare (not safe)) selectively, eg in performance critical code that has been well debugged.

Also note that unsafe optimizations exploit undefined behaviour. Whenever you see "it is an error" in the standard, it is an opportunity for unsafe optimization. Optimizing those to go fast (but unleash dragons if you make a mistake) is not out of the scope in Scheme. When you want to measure the performance limit, you have to take this into account. So the point about comparing implementations that do or do not exploit undefined behaviour is moot.

wingo commented 4 years ago

I get what you are getting at @vyzo, but I don't really share community with people who want (car 1) to do anything other than throwing an error; the goals of users that want anything other an error on such a form aren't at all shared with Guile or supported by Guile, and so I find no utility in comparing benchmark performance in those contexts.

Concretely: when I benchmark against Gambit, I am not interested in comparing to Gambit-with-unsafe-optimizations. If @ecraven's benchmarks don't share those goals -- as apparently they don't -- Guile will stop using them. Not a problem of course; different strokes for different folks.

vyzo commented 4 years ago

You can always make a fork and use that, with your "no unsafe opts" patches applied.

ecraven commented 3 years ago

Sorry for taking so long. I've been thinking about this for a long time, and have come to the conclusion that I'd prefer to compare the safe versions of everything. I'll write a large disclaimer that this is the case with the next run.