PlummersSoftwareLLC / Primes

Prime Number Projects in C#/C++/Python
https://plummerssoftwarellc.github.io/PrimeView/
2.46k stars 572 forks source link

Discuss how to report on alternate algorithms #177

Closed mike-barber closed 3 years ago

mike-barber commented 3 years ago

I think we need to have a discussion to figure out how to deal with reporting for different algorithms, as this adds an extra dimension of fun.

Currently, all of the implementations (except for the C ones) are very similar to the "reference" C++ version, where we consider all odd numbers as potential primes.

The C implementations from @mckoss and @danielspaangberg are really interesting additions, and make reference to this wikipedia section. The same applies to the C# implementation by @Kinematics. Thanks for these! The important thing to note here is that these are algorithmically quite different from all other implementations as they avoid needing to check higher known primes factors besides just 2: variants include 3,5,7, and upwards.

However, this does raise some issues:

In essence, I think we just need to work out what do to on the reporting side. My initial thoughts are along the lines of this:

I'm not sure if this is the right approach, or how to deal with tagging specific lines in the output as "different algorithm". It'll be great to get everyone's input! I know you're working on reporting currently @marghidanu and it's looking pretty cool so far.

marghidanu commented 3 years ago

Hello! We do include a label in the standard output which should account for the solution "category". Right now this label is not taken into account but as we progress with multiple solution types we will include that as a dimension in the report.

Kinematics commented 3 years ago

I was experimenting with breaking up the sieves into components, and run things as an arbitrary combination of those components. The components in question were:

The parallel stuff was set aside as it was getting overly complicated, but even the storage/algorithm pairings had issues of adding a lot of overhead when trying to componentize things.

I still have the partially completed work on a separate branch, but I wasn't sure it was worth trying to build them for the purpose of this repo. At least, not as a part of my existing solution. I'd need to build an entirely new solution to make it work, and I don't know yet how it would perform. It would still feel nice to be able to run it as something like primes.exe --2of6 --arraypool instead of some bastardized parameter name that I had to make up for each combo.

danielspaangberg commented 3 years ago

I also experimented a bit with running things more in parallel, but it is very difficult to get it to run faster if we're only doing the primes up to 1'000'000. I found this nice site that got me a bit inspired (although my code is very different from the java on that page): http://compoasso.free.fr/primelistweb/page/prime/eratosthene_en.php. So I have a working segmented prime implementation using the 5760of30030 map, which works fine for larger number of primes than the ones below 1'000'000. Doesn't use much memory, but since it needs more operations, it is still slower, since the bitmap for the sieve below 1'000'000 fits in L2 cache.

rbergen commented 3 years ago

I think there is indeed a case to be made to separate the basic comparison of languages (which is what @davepl originally created the repo for) from the comparison between algorithms/alternate solutions (in part across languages, by now). I do not believe reusing the free-form "username_optionaltag" field to make the distinction is the way to go, though.

At the moment, my proposal would be to add a category field to the "drag-race" output format, with the following defined values:

Of course, we would need to do some work to update existing base_faithful and base_unfaithful solutions so that they add the right label to their output.

In all this, the definition of the "base" algorithm is of course relevant. I would propose to define it as follows:

mike-barber commented 3 years ago

Thanks @rbergen!

As a pragmatic suggestion, we could append a new (optional) column to the output for the category label. The rationale here is that the vast majority of existing implementations are just base_faithful translations of Dave's C++ code into new languages, and this could be inferred if the column is missing.

Solutions that have interesting work done on algorithms could simply implement the final column with the label, with some runs like 1of2 marked as base_faithful and other more interesting ones labelled appropriately. This would allow us to progress without altering a whole bunch of existing stuff :)

I'm not sure what base_unfaithful means in practice, but I do think that alternate would be useful in this case. It's also easier to decide between just two categories and should result in a lot less confusion around subtle implementation details.

I agree that the definition of the base algorithm hasn't been buttoned down, but I haven't personally noticed anything too strange implemented yet luckily. I need some clarification on some of the points, though, as I haven't quite understood them. Maybe they'll make sense after some more coffee :)

As the previous points indicate, no bit masks are used for seeking or clearing.

To be honest, I think it will need some clarification. In my understanding, all solutions that are using bits for storage will need to use bit masks as bits aren't addressable on most hardware. But you might be trying to make a different point that I haven't picked up on :)

Changing the order of the two mentioned loops is permissible.

I'm not sure what this would permit people to do. I suspect that any solution that exchanges inner/outer loops in some way would be different enough to the original algorithm that it wouldn't be comparable with it, and would need to be labelled appropriately. But I also might have misunderstood what you mean by it.

rbergen commented 3 years ago

As a pragmatic suggestion, we could append a new (optional) column to the output for the category label. The rationale here is that the vast majority of existing implementations are just base_faithful translations of Dave's C++ code into new languages, and this could be inferred if the column is missing.

That's basically what I was thinking too, although my suggestion would be to make alternate the default; this makes the "base-faithful" claim an explicit one, and I do think that's appropriate at a conceptual level. And yes, that does mean that a large number of existing implementations would need to be changed, but that's basically (just) work for me. As I've gone over the majority of the solutions once already to make them compliant with the guidelines, I've become comfortable with the exercise.

I'm not sure what base_unfaithful means in practice, but I do think that alternate would be useful in this case. It's also easier to decide between just two categories and should result in a lot less confusion around subtle implementation details.

There are examples of languages where the nature of the language is such that its "natural form" is not faithful, and there are also languages that just don't include the requirements for a faithful implementation. This is true because for a faithful implementation, some form of "true/false" array, some form of class/object/structure, and means to explicitly choose to create a new instance of the latter are required. If the algorithm is the base algorithm but the implementation does not include things like recreation and initialization of the sieve state then it is not "fair" to compare them with other, faithful, implementations. At the same time, I think it is also not fair to put them in the same category as completely different algorithms. If people are confused or unsure, they can always choose alternate. Also, I and others are here to help make the distinction, if/where needed.

As the previous points indicate, no bit masks are used for seeking or clearing.

To be honest, I think it will need some clarification. In my understanding, all solutions that are using bits for storage will need to use bit masks as bits aren't addressable on most hardware. But you might be trying to make a different point that I haven't picked up on :)

I think I agree with you. In fact, I think this whole point can be dropped, as what it's trying to express is already covered by other points.

Changing the order of the two mentioned loops is permissible.

I'm not sure what this would permit people to do.

It would allow people to run the clearing loop first with 3 as a factor, and then seek for the first (next) factor. I know by experience that for some languages, this makes a real difference for how "language-like" the implementation is, while it actually doesn't change the algorithm, as such. What is missing in my proposed phrasing is the word "inner", as in "Changing the order of the two mentioned inner loops is permissible."

danielspaangberg commented 3 years ago

As a pragmatic suggestion, we could append a new (optional) column to the output for the category label. The rationale here is that the vast majority of existing implementations are just base_faithful translations of Dave's C++ code into new languages, and this could be inferred if the column is missing.

That's basically what I was thinking too, although my suggestion would be to make alternate the default; this makes the "base-faithful" claim an explicit one, and I do think that's appropriate at a conceptual level. And yes, that does mean that a large number of existing implementations would need to be changed, but that's basically (just) work for me. As I've gone over the majority of the solutions once already to make them compliant with the guidelines, I've become comfortable with the exercise.

I'm not sure what base_unfaithful means in practice, but I do think that alternate would be useful in this case. It's also easier to decide between just two categories and should result in a lot less confusion around subtle implementation details.

There are examples of languages where the nature of the language is such that its "natural form" is not faithful, and there are also languages that just don't include the requirements for a faithful implementation. This is true because for a faithful implementation, some form of "true/false" array, some form of class/object/structure, and means to explicitly choose to create a new instance of the latter are required. If the algorithm is the base algorithm but the implementation does not include things like recreation and initialization of the sieve state then it is not "fair" to compare them with other, faithful, implementations. At the same time, I think it is also not fair to put them in the same category as completely different algorithms. If people are confused or unsure, they can always choose alternate. Also, I and others are here to help make the distinction, if/where needed.

As the previous points indicate, no bit masks are used for seeking or clearing.

To be honest, I think it will need some clarification. In my understanding, all solutions that are using bits for storage will need to use bit masks as bits aren't addressable on most hardware. But you might be trying to make a different point that I haven't picked up on :)

I think I agree with you. In fact, I think this whole point can be dropped, as what it's trying to express is already covered by other points.

Changing the order of the two mentioned loops is permissible.

I'm not sure what this would permit people to do.

It would allow people to run the clearing loop first with 3 as a factor, and then seek for the first (next) factor. I know by experience that for some languages, this makes a real difference for how "language-like" the implementation is, while it actually doesn't change the algorithm, as such. What is missing in my proposed phrasing is the word "inner", as in "Changing the order of the two mentioned inner loops is permissible."

I agree with what both of you are saying. When I submitted the various algorithms in C I was surprised that faithful just stated "uses sieve of Eratosthenes" and "does not use external dependencies", so they all were included into faithful. As long as you drop the point about the bit masks, which it seems you are already saying, it all makes sense to me. If bit masks are not allowed it is not possible to express smaller than byte storage, unless one concocts code with divisions and modulus just to avoid the "bit masks not allowed" rule.

marghidanu commented 3 years ago

I see this as two separate problems, one refers to the output format for the current solutions and the other is coming up with a proper ruleset for classifying the solution types.

Problem 1

I propose adding another column that can store multiple labels with values. Something like label1=value1;label2=value2... should give us enough flexibility to support any additional tags that might appear in the future

Problem 2

Using the proposed format above should remove the single classification problem since we can now support any classification based on tags.

rbergen commented 3 years ago

That's an interesting idea, indeed. A few things cross my mind thinking about it:

marghidanu commented 3 years ago

Maybe we should just switch to JSON :)

mike-barber commented 3 years ago

Wow there's been a lot of great discussion!

@marghidanu, I'd avoid probably json unless it's really necessary; while we're used to it in the higher-level language world, it's not part of the standard library of many other languages and will require external dependencies. Or you could just hard-code the string I guess, although it'll get pretty ugly potentially.

I think the bigger question is whether we're going to benefit from having the flexibility of arbitrary key=value pairs. Definitely it's more flexible. But I'm concerned it will just add confusion unless we're very opinionated about what the keys are that we're going to accept. So we'd need to do some thinking about the things we could support:

I don't really have a clear idea of what kind of reporting we want to do, or what attributes we want to extract from it. But if you've got a pretty good use case in mind, it'll help guide us on what sort of keys/etc you'd be happy to support. I think this echoes what @rbergen has said above too.

Also consider the option of just having comma-separated tags instead of key=value pairs; these work much like the way you'd tag a github issue or something, and just represent membership of a set. It's maybe a bit simpler, and I like simple things when coordinating between a large number of different developers :)

Talking to some of the above comments too.

@danielspaangberg

I agree with what both of you are saying. When I submitted the various algorithms in C I was surprised that faithful just stated "uses sieve of Eratosthenes" and "does not use external dependencies", so they all were included into faithful. As long as you drop the point about the bit masks, which it seems you are already saying, it all makes sense to me. If bit masks are not allowed it is not possible to express smaller than byte storage, unless one concocts code with divisions and modulus just to avoid the "bit masks not allowed" rule.

This makes sense :) We'll get the fine-grained reporting we need by having labelling each "run" appropriately rather than marking the whole solution as faithful/otherwise. It's great having different implementations in the same solution as it's a lot easier for readers to understand and compare.

I think we're all in agreement on the bit masking as it's essential for bit-level storage. Even C++ does this internally behind vector<bool> even if it isn't immediately obvious :) The only potential loophole here is that one could mask off multiple bits in a single word and apply the mask once. This definitely would skew results. But for single bit mask/apply, it's ident to the C++ reference implementation.

@rbergen

There are examples of languages where the nature of the language is such that its "natural form" is not faithful, and there are also languages that just don't include the requirements for a faithful implementation. This is true because for a faithful implementation, some form of "true/false" array, some form of class/object/structure, and means to explicitly choose to create a new instance of the latter are required. If the algorithm is the base algorithm but the implementation does not include things like recreation and initialization of the sieve state then it is not "fair" to compare them with other, faithful, implementations. At the same time, I think it is also not fair to put them in the same category as completely different algorithms. If people are confused or unsure, they can always choose alternate. Also, I and others are here to help make the distinction, if/where needed.

This is an interesting one, but I'm not sure I can think of an easy example. I can imagine fun stuff with functional/recursive code, but that's not my area of expertise. In my head, you'd have to allocate space for the sieve and initialise it to apply Erastothene's sieve algorithm to it. Do you have something in mind you can point me at?

Changing the order of the two mentioned loops is permissible. It would allow people to run the clearing loop first with 3 as a factor, and then seek for the first (next) factor. I know by experience that for some languages, this makes a real difference for how "language-like" the implementation is, while it actually doesn't change the algorithm, as such. What is missing in my proposed phrasing is the word "inner", as in "Changing the order of the two mentioned inner loops is permissible."

Ah, thanks! I see what you mean, and I definitely agree. It won't have much impact on performance. We'll need to be careful with the phrasing as it's quite hard to convey clearly.

Maybe something along the lines of: there are two operations within the outer loop: 1) searching for the next prime from in the sieve, and 2) clearing this prime's multiples in the sieve. In order to allow for idiomatic code, it is permissible to perform these steps in either order, provided the results are correct.

marghidanu commented 3 years ago

@mike-barber That’s why I suggested storing the tags in an arbitrary format, it’s impossible to think of all dimensions right now. We need something that is extensible. As for the output of the implementations, I can tell you that all output will eventually end up in JSON since I’m writing the report generator :) Because the parser actually will convert any output to JSON for generating the HTML files.

Kinematics commented 3 years ago

I think we're all in agreement on the bit masking as it's essential for bit-level storage. Even C++ does this internally behind vector<bool> even if it isn't immediately obvious :) The only potential loophole here is that one could mask off multiple bits in a single word and apply the mask once. This definitely would skew results. But for single bit mask/apply, it's ident to the C++ reference implementation.

To me, 'bitmasking' is different from 'bit manipulation'. Bit manipulation is setting a single bit at a time in a way that is specific to a single operation, whereas bitmasking is setting multiple values at a time in a way that can be repeatedly applied (a rubberstamping effect). So bitmasking is a parallelization tool, whereas bit manipulation is merely a way of saving memory.

Bit manipulation may use a mask/bitmask (noun) to apply its effect, because the languages and CPU's don't allow direct addressing of bits, but it's different from bitmasking (verb) as an operation concept.

It's basically what you describe, but with separate terms describing the two methods, rather than overloading a single term to mean both types of operations (and confusing people).

rbergen commented 3 years ago

@mike-barber

This is an interesting one, but I'm not sure I can think of an easy example. I can imagine fun stuff with functional/recursive code, but that's not my area of expertise. In my head, you'd have to allocate space for the sieve and initialise it to apply Erastothene's sieve algorithm to it. Do you have something in mind you can point me at?

Assembly is a prime example, in my view. I've now coded 4 implementations in various incarnations of that language, 3 of which are now in the drag-race branch (the arm64 version is on its way). In my opinion/experience, the "natural flow" of assembly is to just thunder through the algorithm, and not spend coding time/bytes/processing cycles on things like subroutine calls and heap allocations unless absolutely necessary. In fact, for a proper "faithful" comparison, one basically needs to step out to libc function calls. In C and other languages these are very much "native" to the language, but from the perspective of assembly, one could argue they are "external".

rbergen commented 3 years ago

@marghidanu As I now understand what we're discussing, we would end up with a definition of "official" labels, similar to this:

Label Values
algorithm base for the algorithm used in the original C#/C++ implementations <insert reference/code snippets>
faithful yes or no <insert reference to definitions>
... ...

These would then be combined with the existing standardized output format, to create something like this:

rbergen;2567;5.012;1;algorithm=base,faithful=no

Would this be in line with your thinking?

mike-barber commented 3 years ago

Assembly is a prime example, in my view. I've now coded 4 implementations in various incarnations of that language, 3 of which are now in the drag-race branch (the arm64 version is on its way). In my opinion/experience, the "natural flow" of assembly is to just thunder through the algorithm, and not spend coding time/bytes/processing cycles on things like subroutine calls and heap allocations unless absolutely necessary. In fact, for a proper "faithful" comparison, one basically needs to step out to libc function calls. In C and other languages these are very much "native" to the language, but from the perspective of assembly, one could argue they are "external".

Thanks, that makes a lot of sense! I wasn't thinking about assembly :) And I see what you mean -- you've basically got ~100 extra lines to deal with the housekeeping to handle the setup and memory allocation. Where the "slightly unfaithful" one just has reserved space in the data section.

Nice work, by the way!

My understanding is that believe heap allocation is always external in some form or another. At some point, you've still got to do a syscall to get some pages from the OS. I wouldn't worry about that being external too much :)

rbergen commented 3 years ago

Nice work, by the way!

Thank you! After tripping over myself a few times at the beginning, I did enjoy coding both the x64 and the arm64 implementations. I did find out that documentation on arm64 assembly is quite a bit behind that on x64. Particularly finding out how to execute a square root with conversions from and to "integer" registers was quite a bigger puzzle than I expected.

My understanding is that believe heap allocation is always external in some form or another. At some point, you've still got to do a syscall to get some pages from the OS. I wouldn't worry about that being external too much :)

Indeed, which is why I had the audacity to still label the implementations using those calls as faithful. :)

mike-barber commented 3 years ago

I did find out that documentation on arm64 assembly is quite a bit behind that on x64. Particularly finding out how to execute a square root with conversions from and to "integer" registers was quite a bigger puzzle than I expected.

I know what you mean! I had the same experience doing some simd stuff on arm64 before. Risc is fun though :)

Indeed, which is why I had the audacity to still label the implementations using those calls as faithful. :)

I think it is, on the basis that it's doing the same work and allocating memory and so on.

More relevant to the general thread, though, I think we're in a good place on the ideas for labels and so on. Thanks for fixing the format in all the solutions, including mine! Happy to help with any other changes needed.

The next thorny issue to work out is how to do the benchmark runs outside of the default Azure runners that GitHub provides -- results are all over the place between runs. But I think that'll be a subject for a different thread :)

rbergen commented 3 years ago

I know what you mean! I had the same experience doing some simd stuff on arm64 before. Risc is fun though :)

RISC, or the arm64 implementation of it, is no-nonsense, clean and consistent. I really do like that. The only instructions I found oddly absent at a conceptual level were push and pop. (I know, the 16-byte SP alignment requirement makes it practically impossible to implement them, but still...)

More relevant to the general thread, though, I think we're in a good place on the ideas for labels and so on. Thanks for fixing the format in all the solutions, including mine! Happy to help with any other changes needed.

Well, now that we have a proposal for a structured distinction between implementations, we will at some point have to implement it. I say that in part because @davepl has been very supportive of doing so in recent e-mail communication I've had with him. So, it may not be too long before we indeed need to go over the whole set of solutions, again.

The next thorny issue to work out is how to do the benchmark runs outside of the default Azure runners that GitHub provides -- results are all over the place between runs. But I think that'll be a subject for a different thread :)

We may be closer to an actual solution for that issue than one might think. It's still work in progress, but in my opinion some good progress on externalizing the benchmark runs is being made. If you have any systems laying around that could act as benchmark runners, I think it's not too soon to start dusting them off. :)

mike-barber commented 3 years ago

We may be closer to an actual solution for that issue than one might think. It's still work in progress, but in my opinion some good progress on externalizing the benchmark runs is being made. If you have any systems laying around that could act as benchmark runners, I think it's not too soon to start dusting them off. :)

Awesome! At work, we use self-hosted runners for a bunch of things, so my initial instinct was to do that. I can't reconcile how to make it secure enough on a public repo though: It's a bit scary running code you don't control. Even if you put the runner itself in a Docker container, it still needs network access to pull dependencies etc. I haven't thought of a good way to solve this yet, but definitely happy to help if I can.

marghidanu commented 3 years ago

Runners on a public repo are not really an option for us. We are now testing the bit of code that allows any of us to run the benchmark on any machine externally. Additionally, we will provide a way to publish the local results to a web service and gather the data into a web app so anybody could see it. More to come in the following few days.

mckoss commented 3 years ago

The problem with the original definition of "faithful" is that it did not exclude other sieve-algorithms that employed further optimizations. So, those algorithms, like mine, were no longer apples-to-apples comparisons of languages and language features, but rather of algorithmic optimizations.

I think both are interesting.

To my mind, we should have a category of "direct ports" of the original algorithm to other languages, used for language comparisons - perhaps clarifying "faithful" as using Dave's original algorithm in all respects.

And then, more interesting to me are other algorithmic optimizations - still using a sieve, but applying different techniques to optimize cpu or memory. Actually, any algorithm that could produce the list of primes to 1M by any method would be interesting even if not using a sieve (though I don't know if there are more efficient algorithms for producing bulk primes within a range).

A tag like "algorithmic" could be applied to these entries.

We don't gather statistics like memory use - though I think that could be interesting.

rbergen commented 3 years ago

In line with what was discussed in earlier comments, I think one-dimensional tagging is going to be too limited. I therefore still think that working with key-value pairs is the way to go. For one, from a perspective of comparing algorithms, I think it would be very useful for the benchmark reports to not only indicate that solutions are intended for an algorithmic comparison, but also what algorithm each solution implements. To phrase it this way: you're going to want to make comparisons between (the best performers in) the range of algorithms that are in the set of solutions, as well as compare the various implementations of "8 of 30" with each other, to name one example.

In that context, I think that once we start identifying implementations that stick to @davepl's original algorithm as "base", then the exact meaning of faithful vs unfaithful will become somewhat less relevant (but not meaningless). It will then make a statement on the (technical) implementation style, instead of defining the implementation as a whole.