Improve Fortran parse_int benchmark

LKedward commented 4 years ago

The poor performance of the Fortran parse_int benchmark is attributable to the use of an inefficient internal write statement to generate the hex string for parsing, i.e. you are mostly benchmarking the write statement currently. While functional the internal write statement is not a suitable choice for this task; I have instead added a specific routine for generating the hex string, as done in some of the other language benchmarks, which on my machine takes almost 70% off the benchmark time for -O3. I have also replaced the branching logic in the parse_int with a static table which is more representative of how this kind of character parsing is performed.

PallHaraldsson commented 4 years ago

Thanks, it's not my call, but I believe the benchmarks were meant to test some obvious code, not optimized. I didn't review in detail, it seems to me your making up for a deficiency in the Fortran standard library.

E.g. this line with unexplained constants was unclear to me:

[(char(i),i=48,57),(char(i),i=65,71)]

I looked into what's done for some other languages, e.g. Julia and Go. Yes, calls hex (or similar function for 1.0) for Julia.

I was surprised to see rng in the Go code?!

I'm not saying your code is bad, it probably is good, and should be added to Fortran's standard library. Also here?

StefanKarpinski commented 4 years ago

Changing the slow write statement to something else seems entirely fair. Changing the core algorithm to something table-based does not as it means that Fortran is using a fundamentally different algorithm than all the other implementations, whereas the point of these benchmarks is that all the languages use substantially similar algorithms.

LKedward commented 4 years ago

I believe the benchmarks were meant to test some obvious code, not optimized

This is a fair point, the second commit falls into this category of optimised code.

Changing the core algorithm to something table-based does not as it means that Fortran is using a fundamentally different algorithm than all the other implementations, whereas the point of these benchmarks is that all the languages use substantially similar algorithms.

Thanks, yes I agree.

LKedward commented 4 years ago

E.g. this line with unexplained constants was unclear to me:

[(char(i),i=48,57),(char(i),i=65,71)]

I've now updated this to remove the hard-code values and be more explicit.

it seems to me your making up for a deficiency in the Fortran standard library.

Fortran simply doesn't have an intrinsic for this task like other languages; my solution is implemented in pure Fortran and is clear to follow.

StefanKarpinski commented 4 years ago

Given that Fortran doesn't have this primitive, it seems like it should probably cleave to the simple C code version as closely as possible:

long parse_int(const char *s, long base) {
    long n = 0;
    for (; *s; ++s) {
        char c = *s;
        long d = 0;
        if (c >= '0' && c <= '9') d = c-'0';
        else if (c >= 'A' && c <= 'Z') d = c-'A' + (int) 10;
        else if (c >= 'a' && c <= 'z') d = c-'a' + (int) 10;
        else exit(-1);

        if (base <= d) exit(-1);
        n = n*base + d;
    }
    return n;
}

Prominent features:

no table lookups, just byte comparisons and arithmetic
does basic error checking for invalid characters

Arguably, we should just implement this simple algorithm in each language instead of calling potentially super-optimized primitives in each one. The point, after all, is to test how good each language is at this kind of simple byte-by-byte parsing code.

PallHaraldsson commented 4 years ago

Is ichar('G') a typo? It's both F+1 (so I thought possibly intentional, but see "9"), but also the letter next to F on the keyboard.

LKedward commented 4 years ago

Is ichar('G') a typo? It's both F+1 (so I thought possibly intentional, but see "9"), but also the letter next to F on the keyboard.

Sorry yes you're right this is a typo

LKedward commented 4 years ago

Given that Fortran doesn't have this primitive, it seems like it should probably cleave to the simple C code version as closely as possible

Yes I agree; I've reverted my optimisation of the parse_int routine so that the only modification in this PR is the replacement of the internal write with a specific routine for generating the hex string. The original parse_int routine closely resembles the c version.

StefanKarpinski commented 4 years ago

Any maintainers know enough FORTRAN to review?

johnfgibson commented 4 years ago

I don't know much Fortran, but I can give it a look, do a comparison to the C, run it and verify its behavior. It's time Iupdated all my language installations and reran the whole test suite anyway.

StefanKarpinski commented 4 years ago

Thank you soooo much, @johnfgibson! I cannot express how much it's appreciated.

JuliaLang / Microbenchmarks

Improve Fortran parse_int benchmark #41