Closed AaronKutch closed 5 years ago
Also I should note that it should only be faster if the cpu this is run on has 64 bit division capability.
Hey, wow this is awesome! I am very time constraint today but will definitely look into this tomorrow and give you feedback about it. :)
Thank you very much and keep up the good work!!
I just found out some more problems with my algorithm. I will fix it and clean it up today
Okay tell me when you are done and I will review the cleaned up code. :)
I edited my initial comment with the new code. The rand
benchmark unfortunately regressed from about 5x to 2.7x improvement due to needing some extra computation to prevent the off-by-one error, but everything else looks good. I had another regression that I fixed by adding the carrying_mul
part and I might look at the generated assembly a bit and improve some things.
I have downloaded the code and benchmarked it. Added some minor tests, especially for some edge cases but couldn't find any bugs. So this is the long-division algorithm as stated in the comments, right? Can this implementation of it simply be generalized to work with bit widths greater than 128 bits?
Feedback
carrying_mul
- very future driven development! :)a < b => return 0
that should improve performance in some cases for large numbers.largest integer that can still be divided fast on the hardware
stuff.That's what I got so far.
And not to forget: Thank you for the hard work as always! :)
The reason the macro was made was because it can potentially be used for a better general apint
implementation. The original general algorithm involves dividing a DoubleDigit
by a Digit
followed by many Digit
multiplies, but the 128 bit division provided by rust was so slow that it was faster to halve all of the parts and use a 64 bit division (at least in many cases). I still need to test if my 128 bit division algorithm is fast enough to go back to using full Digit
s, but either way everything is sped up a little. Unit tests are not very useful for an algorithm with many branches, so instead I might make a formal proof that all inputs work.
Do you think this has a good chance at replacing Rust's current division algorithm? I made it a macro because it should make a faster 64 bit division algorithm for cpus without hardware 64 bit division, and so on. I know that Rust has a way of building for specific cpus (e.g. the actual bit width of usize
can change), and it can switch between algorithms depending on that.
I understand the reason for the macro in the case you want to file this algorithm for the standard library. I can imagine that you see performance regressions towards the standard lib since your benchmark hardware might be too old to support 64-bit and 128-bit hardware-division and thus it is a soft-division as is yours. You could at least try to show this algorithm to some people that have knowledge about this stuff. I think the best plattform for this would be a dedicated IRC channel. :) Try it!
I improved on it some more but now the function, test, and bench marking code is over 400 lines so I won't post it here. I think I should make a crate for it and some other functions, so I can point at it and others can look at it and use it. What should I call it?
An external crate for this might be a good idea if the algorithms are in a very general form and are customizable while remaining to be efficient for general and special purposes. A name could be simply the techical name of the algorithms you implemented, e.g. long_division
.
I am almost ready to publish my crate specialized-div-rem
. I think I will add a dependency to apint
that depends on this crate. We can add a feature that turns this on and use the standard library 128 bit division otherwise.
It is nice that you have found a general enough way to implement this feature as a standalone crate. :) I think feature opt-in as you have described would be the preferred option here. :) Thank you again for the hard work!
I have it on crates.io and need to fix some things, but how do I get it onto Github? Do I just create a repository and push to that?
Yeah you have to create a new repo at GitHub. GitHub displays very concrete and nice steps how to continue if you already have a repo on your hard disk. :)
Could you glance over everything to check if anything is wrong? https://crates.io/search?q=specialized-div-rem. The only problem I see is very long lines. I can't find what rust recommends for maximum line length. I ran rustfmt but it does not fix the comment lines
I found out that when I add a dependency to the crate, cargo gives an error when trying to compile: error: no matching package named specialized-div-rem
found. I think I configured something wrong. Also, do I need to put some kind of licensing thing in the README.md like you did for apint
?
I have reviewed your crate and have the following feedback for you.
README
should be formatted properly with markdown code segments, especially for the benchmark statistics.#[inline(always)]
variants are a huge gain for the library and I am also confused by the _long
suffix since all functions have it. In my honest opinion you should also even avoid having u128_div
and u128_div_rem
and only provide the API that covers more functionality: u128_div_rem
as well as u64_div_rem
and u32_div_rem
.Another question: What is with i32
, i64
and i128
? Why is it not supported as well to provide a proper coverage for all primitive types?
So to be precise the proposed API by me is:
fn u128_div_rem(divident: u128, divisor: u128) -> (u128, u128);
fn u64_div_rem(divident: u64, divisor: u64) -> (u64, u64);
fn u32_div_rem(divident: u32, divisor: u32) -> (u32, u32);
All in all it is a really good job you have done here but needs some refinement. ;)
I am finally implementing an efficient division implementation for your crate. Because division algorithms for computing the quotient also give the remainder as an (almost) free side effect, I am adding a function called aarons_algorithm
(which is pub(crate)
) which makes the quotient and remainder. All of the division and remainder functions become wrappers around this function. I have an #[inline(always)]
attribute on just the aarons_algorithm
(sometimes the compiler does not inline when it really should be). The compiler will then throw away as much code as it can when it knows that only the quotient or remainder is used, except for a wrapping_udiv_urem_assign
function which I added.
When writing the algorithm I discovered that I could avoid allocations in more places than normal if I made the function signature for aarons_algorithm
and wrapping_udiv_urem_assign
into (lhs: &mut ApInt, rhs: &mut ApInt)
, and assigned the quotient to lhs
and the remainder to rhs
. This means that I have to do some cloning when wrapping aarons_algorithm
with a function that assigns to only lhs
, but the same allocation would still have to happen anyway if I did not.
To allow this, I think I should rename zip_access_data_mut
to zip_access_data_mut_self
and add a zip_access_data_mut_both
.
Finally, do you think it is a good idea to inline(always)
most things that are pub(crate)
or private, especially the zip_access_data_mut
related functions which I think will help with my strategy above?
Just to clarify, I can avoid more allocations by both not needing to initialize an ApInt
for the remainder, and because I can use rhs
as a holder for intermediate computations before the remainder is assigned to it.
I was almost done with the final part of the algorithm and I realized that it was more efficient to have lhs
become the remainder and rhs
become the quotient. This is due to the fact that lhs
is subtracted from until it becomes the remainder. I did not realize this with the other algorithms and my work on specialized-div-rem
because on the simpler cases there was room for shifting around stuff to make the problem mostly disappear. I started out with div
before rem
because we usually think of "divisor and remainder" instead of "remainder and divisor", but now it is obvious in hindsight to me that it is more natural to use the other. I think it will slightly benefit most of my quotient and remainder algorithms performance wise if I changed from a div_rem
signature to a rem_div
signature. Do you think it is worth it to change all these functions?
EDIT: I figured out that I need both a div_rem
and a rem_div
function
We should hide this performance aspect of your implementation detail under the surface of our interface and should continue to call it div_rem
as it is done in other APIs. So internally you could simply perform the swap which should be inlined away by the compiler anyway.
I am sorry that I do not respond so quickly at the moment. But my free time is very constraint at the moment. This will be better starting in July.
Please just continue with your work and it will be merged in the end! :) Thanks a lot for all the work!
Do you have any comments about my comment from 3 days ago?
I did a fresh clone of apint
and I noticed a warning that from_hi_lo
is not being used (but my future commits will use it). Can I modify that function to be from_lo_hi
to make it little endian?
I really want to expose a wrapping_udiv_urem_assign
, a wrapping_urem_udiv_assign
function (I have 2 specialized strategies for allocations now), and their signed versions. Because both arguments are modified, I don't feel that calling them with the usual way, apint0.wrapping_udiv_urem_assign(apint1)
, is appropriate and instead of self
and other
the signature uses lhs
and rhs
and the way to call the function becomes ApInt::wrapping_udiv_urem_assign(apint0,apint1)
.
I am not adding into_
versions of these functions because it raises a lot of complexity, and just cloning instead makes it obvious where and how many allocations are happening.
The only problem I have now is that ZipDataAccessMut
and zip_access_data_mut
does not supply both arguments as mutable.
I think that I should rename zip_access_data_mut
to zip_access_data_mut_self
, ZipDataAccessMut
to ZipDataAccessMutSelf
, and add ZipDataAccessMutLhsRhs
and zip_access_data_mut_lhs_rhs
.
Actually, can I change all the _hi_lo
functions to _lo_hi
?
Right now I have almost no free time. In three or four weeks I will be able to finish my work on apint
in about two weeks.
@Robbepop I haven't heard from you in a month and I haven't received any responses back after I tried to email you.
Hey Aaron,
@Robbepop I haven't heard from you in a month and I haven't received any responses back after I tried to email you.
sorry I was very busy and couldn't focus on my open source projects. Also I haven't gotten any e-mail from you as far as I can see. Maybe it went accidentally to the spam or trash folder?
I am also not available next week but will hopefully and finally find time again for my open source projects at August the 18th. :)
I did a fresh clone of apint and I noticed a warning that from_hi_lo is not being used (but my future commits will use it). Can I modify that function to be from_lo_hi to make it little endian?
I am currently not sure about handling of little and big endianess in apint but I think such a decision is good.
Actually, can I change all the _hi_lo functions to _lo_hi?
You can and I think it is saner.
I really want to expose a wrapping_udiv_urem_assign, a wrapping_urem_udiv_assign function (I have 2 specialized strategies for allocations now), and their signed versions.
Naming them wrapping_udivrem_assign
is more appropriate I think. But that's just taste.
I think that I should rename zip_access_data_mut to zip_access_data_mut_self, ZipDataAccessMut to ZipDataAccessMutSelf, and add ZipDataAccessMutLhsRhs and zip_access_data_mut_lhs_rhs.
I am more with ZipDataAccessMutBoth
. ;)
Sorry for the short answers but as you already know my knowledge and perception about the apint code base is not up to date since I haven't put eyes on it for a longer time. But this will change in about 2 weeks I guess.
I stopped getting comment notifications. I checked my Spam folders and sure enough I have also been missing a bunch of important github activity. I decided a while ago to email you directly but the only email I could find associated with you was robbepop@web.de
from your github.io page. I just published a new specialized-div-rem
version with some algorithm changes I developed when working on the long division implementation for apint
. I will rename the functions according to your suggestions. I am a few days away from finishing my work on apint
.
Also I should mention that arithmetic.rs
is becoming very long and it might be a good idea to split it up in the future. Maybe once Rust 2018 is stabilized the entire library can be refactored some (note that the mod.rs
file will not be needed in Rust 2018).
while writing debug code I found out that println!("{:?}",ApInt::from([0u64,1,0,0]).into_wrapping_sub(&ApInt::from([0u64,1,0,0])));
produces a nonzero answer. I fixed the subtraction function and hopefully I can split up my megacommit into smaller ones before submitting a pull request.
robbepop@web.de is my spam email so I normally do not look into it very much.
The new specialized-div-rem
version sounds promising. :)
Also I should mention that arithmetic.rs is becoming very long and it might be a good idea to split it up in the future. Maybe once Rust 2018 is stabilized the entire library can be refactored some (note that the mod.rs file will not be needed in Rust 2018).
This sounds like a decent idea. I also do not like files that are getting too big and apint
has a lot of really big files - luckily most of the time those files do not contain lots of logic. :)
while writing debug code I found out that println!("{:?}",ApInt::from([0u64,1,0,0]).into_wrapping_sub(&ApInt::from([0u64,1,0,0]))); produces a nonzero answer. I fixed the subtraction function and hopefully I can split up my megacommit into smaller ones before submitting a pull request.
This is terrible (that this bug really existed) and awesome (your fix) at the same time. Thank you very much for that!
The algorithm is finally working. Tomorrow I will try to wrap everything up and push all my progress to my apint
fork.
I just pushed my progress to the apint
fork for you to check. The test code is not complete (and some commented out printlns I will remove in the future), I will optimize the algorithm some more (such as seeing what impact on performance using smallvec
and inlining has), and the signed divisions are not working yet. I will fix any problems you find and fix the commit situation before I submit a pull request.
There will be 12 division functions when I am finished, but I chose to make the signed ones wrappers around the unsigned ones instead of inlining into all of them, so it shouldn't bloat compilation times.
What will the 12 division functions be? I am curious!
I wish you good luck fixing the remaining signed versions. :)
It's true smallvec
can have extreme performance implications. I have experienced several in other crates. Nonetheless, I'd say its an expert feature for performance critical execution paths.
My free time is about to become restricted again. I encountered some corner cases involving ApInts with bitwidths of 1, and I realized too late that my rotation function only works when the ApInts have no excess bits. I am mostly confident however that all the other functions work with no edge cases detected by the regular tests and a fuzzing test.
I will postpone the rotation function for another release.
There is nothing else that I see that should be changed in apint
s functionality from the outside, except for the one bit ApInt problems.
Should I do a pull request with what I have now and you can update dependencies and release 0.3.0?
I would like you to wait for a PR until your implementations are correct. But for now I am okay with you doing some very simple fixes for the affected algorithms even though they might negatively impact performance: Plus tests that cover also the 1-bit width cases for example. Later we can easily improve performance with additional changes. But correctness is of most importance.
As always thank you very much for all the work on apint!! :)
After implementing the fuzz tester I have encountered about 10 off-by-one errors and had to rewrite the core division loop twice lol. ApInts with bitwidth 1 haven't been giving me trouble though (earlier I had some bug with the testing code itself).
The division implementation seems to be really hard!
Good to know that 1-bit width ApInt
instances no longer trouble your implementation!
Do you write regression unit tests for the failing fuzz tests you encounter? These could be very handy in the future if one is about to rewrite those functions.
I hope you work is soon stable enough for usage in the main repo. :)
I just got my algorithm to completely pass the fuzz tester for the first time, and all tests are working. I am now working a little on performance and will hopefully wrap up in the next two weeks. However, I am concerned about the casting::tests
group which are marked as ignored when I run cargo test
. Why is this to begin with and how do I test these?
Sounds awesome to me and I am very much looking forward to your work when everything is done! :)
I have to tell you something very terrible ... there are no tests for the casting operations, yet. There are just stubs that I created once I figured out a way I could test the casting operations in a nice way but I wasn't too happy about it and so it was never done. So yeah ... the entire set of casting operations (besides a few very special ones) are completely untested.
This is bad and I feel bad for it. However, since many operations rely on a working casting operation I think they are indirectly tested at least.
Another thing I realized is that Rust 2018 is closer than we think. What if after I submit the pull request you install the RC1 beta, cargo fix
, cargo test --release
(the fuzz tester is quite expensive), refactor the file locations of the library to split up arithmetic.rs
and mod.rs
, review the library while doing so, and then do the little endian switch?
I am not sure when is the best time to convert crates to the new 2018 edition.
Please let us discuss these changes in another issue.
Besides that I think your recommended changes are decent. The arithmetic.rs
file is really huge.
I have also thought about Digit
and DoubleDigit
and their usefulness. In the num::bigint
create these types are merely type aliases and now I do understand why they chose that design. The Digit
and DoubleDigit
wrappers are nice but add too much friction in the code in their current design - at least to my understanding. But yeah, please let us discuss those future changes in another issue. ;)
I like the Digit
and DoubleDigit
wrappers because they allow me to directly control the way certain methods work. If I decide to put more work into apint
in the future, it will probably involve making more special-purpose methods for them instead of macro and indexing spaghetti. In addition, if somebody wants to make the underlying types different and forks the repository they can relatively easily modify it. I will check the library for hard coded things and replace them with digit::BITS
or other methods like I have my own code. On the other hand, I am not sure about Bit
which I think could be replaced by bool
.
I just realized that fixing up uint
and int
is going to be a pain now and in the future. I have a large amount of experience with macros and doing stuff like
macro_rules! impl {
(
...,
$($add_attr:meta),*;
$($sub_attr:meta),*
) => {
$(
#[$add_attr]
)*
pub fn $add_name(...) -> ...
that way when calling the macro, I can inject doc = "..."
and other things to customize. In cases where signed and unsigned ways of doing things are strictly different, we can also inject token trees and stuff.
Maybe we should just bite the bullet instead and Ctrl-C Ctrl-V away.
I strongly suggest that we convert to Rust 2018 very soon - it will probably take until around stabilization that this crate finally gets relatively stable like it was before I found this crate. I already see things that could benefit from Rust 2018 and I think it would be inefficient to make two file refactoring passes instead of just one with Rust 2018 making certain things easier.
Can I implement a function from_vec
because I am making a corner case tester and it would make it much easier, or is there something else I could do?
There is already a more generic way to create Apint
instances from iterators (and thus vectors).
https://github.com/Robbepop/apint/blob/master/src/apint/constructors.rs#L162
I also dislike the fact that many parts especially in the int
and uint
module are Ctrl-C + Ctrl-V'ed but I am always critical about macro explosion in code since many times this leads to unreadable library code in the end that is hard to maintain. As of now the code in those modules is very easy to maintain but a bit unfortunate to modify. So it is a balance we have to take here.
The crate will surely convert to the Rust 2018 edition as soon as possible and it makes sense to undergo those huge modifications during that process - as you have said.
Maybe you are right about the usefulness of the digit wrappers. I am currently just a bit out-of-sync for the code base and have to work myself into it again some time.
In order to use from_iter
I have to do ApInt::from_iter(vec![32u64,234,23].iter().map(|x| Digit(*x))).unwrap()
and there are two things here that are not public. Making Digit implement From<u64>
would involve breaking the ability to change the type of Digit in the case of u128. Maybe we could have a function with a signature of Vec<Into<Digit>>
but we would still run into problems with u128 and higher. I think there should exist a public From<Vec<u64>>
function that takes a plain Vec<u64>
. In fact, I just remembered one of my problems with bigint
is the lack of an easy way for outside callers of the library to get and set parts of the ints via vectors of machine words. So, we should also have a to_vec_u64
or `Into<Vec
Edit:
Actually, if the length of the Vec is zero then the function has to panic, so instead a from_vec_u64
function that returns a result should be used instead
I was doing some compilation shenanigans, but I needed to pass a --no-default-features
flag and the compiler is giving the error error: cannot find derive macro Serialize in this scope
. It should compile with all flag combinations, right? I presume there is supposed to be
#[cfg_attr(serde_derive, Serialize)]
#[cfg_attr(serde_derive, Deserialize)]
which is what I am typing down.
I should mention that I have significantly simplified the code for the function (there were some obvious duplications and rough edges that I missed) but it doesn't seem to affect performance any, so I will wait on doing a pull request for fixing up the function.
I have been working on an algorithm, and I made a version of it for 128 bit integers. On my computer, the benchmarks are showing it to be about 4.5 times faster than what rust currently uses (except for the
mid
andlo
benchmarks which still show improvement). EDIT I now have the algorithm here https://github.com/AaronKutch/specialized-div-rem/blob/master/src/all_all.rs