codeforscience / webdata

Discussion on improving numeric computing/data science on the web (JavaScript, HTML5)
164 stars 3 forks source link

64 bit integers #1

Open max-mapper opened 9 years ago

max-mapper commented 9 years ago

The JavaScript Number type can handle floating point numbers up to 53 bits in precision, but all byte level operations (>>, |, etc) only work with numbers up to 32 bits in precision.

There are many use cases in scientific computing where you need 64 bit precision arithmetic. Currently doing this in JS means using a userland "big number" library like https://github.com/indutny/bn.js. It would be excellent if 64 bit numbers and arithmetic was supported by JS so that it could be fast.

TC39 (JS Standards body) has a proposal for this:

https://gist.github.com/BrendanEich/4294d5c212a6d2254703

But it currently lacks a 'champion', meaning nobody is actively working on refining the proposal into something they can recommend for implementation.

I think it would be great to help them find a champion, the requirements are:

Someone who's (a) part of a company paying Ecma member dues; (b) willing to and preferably experienced in writing a spec.

gaborcsardi commented 9 years ago

Personally I don't think 64 bit integers are essential. Nice to have, sure, but maybe not the first thing you need for data analysis. AFAIK R does not have 64 bit integers for example.

max-mapper commented 9 years ago

@gaborcsardi Great point. I agree there are many things where 32 bits are plenty :)

On the other hand I think this is something that is a low hanging fruit and should be pursued, whereas other issues may be more complex or controversial and have a less clear path to a 'win'.

gaborcsardi commented 9 years ago

I think one slightly related, and in my opinion more important issue is that AFAIK the maximum length of a JS array is also limited by a 32 bit integer. And on machines with largish amount of memory this is becoming a problem nowadays. So R has been gradually introducing long vectors, for example, to handle this.

Now, I am not even sure what kind of data type you would use for a double vector in JS. Yes, lists are great of course, but they are also super inefficient when you need to operate on long numeric vectors. (But I have no idea if there optimizations if you have a long list of numbers.)

But anyway, I think the long vectors are needed, and the large integers not so much, and even if they are, you can just use an arbitrary precision library, like yourself mentioned, so there is an "easy" solution. No really easy solution for the vector length. Personal experience only, of course. :)

indutny commented 9 years ago

I think that the main use for these proposed 64-bit API methods is for bignum libraries like bn.js. Introduction of such methods won't change the internal representation of big numbers in bn.js, but will rather make mul much faster there (I don't even have an exact number, but I expect it to be around 50%, maybe more).

kgryte commented 8 years ago

@gaborcsardi 64-bit integers are more useful than you suggest.

  1. Counting. Integers provide useful data structures for known integer values. For signed 32-bit integers 2**(32-1) = 2147483648, which means integers max out at just over 2 billion, a relatively small number in many numeric computing applications. While R does not have 64-bit integers, they had to include workarounds where, to prevent overflow, a 32-bit integer can be coerced into a floating value. This leads to various oddities/issues for memory allocation, etc.
  2. Efficient operations. While compilers are heavily optimized for floating-point arithmetic, the fact remains that integer arithmetic is faster. In theory, floating-point arithmetic requires 3 integer operations.
  3. Random number generation. This has obvious applications in cryptography but also more generally in building robust general purpose random number generators, where natively having 64 bits is highly advantageous.
  4. Performance. While, yes, we can use polyfills like bn.js, these are workarounds, which, due to their very nature of being polyfills, are performance constrained.
  5. IDs. Similar to counting, but from an application angle, 32-bit integer IDs are rarely enough. So, to provide a workaround, people create 64-bit IDs using strings, which is memory inefficient compared to just storing the IDs as 64-bit integers. As a general conclusion, we should not underestimate the importance of memory efficiency.
  6. Large arrays. You already touched on this, but the ability to create large arrays would be beneficial. Something which is not possible now without using generic objects.
  7. Currencies. Common belief suggests that 64-bit integers are essential for financial applications. This is one of the most frequent criticisms I hear for using JavaScript for numeric computing.
  8. Parity. While R may not have native support for 64-bit integers, many other numeric computing environments do: Java, MATLAB, Python (unlimited precision), Mathematica (up to 128-bit), etc.

As @maxogden commented, int64 and uint64 are probably lower hanging fruit than other numeric computing features and would provide a significant step forward in extending the scope of what is possible using JavaScript.

bcomnes commented 8 years ago

It sounds like webasm is bringing 64bit ints to JS: https://youtu.be/gO2tt9x9zBc?t=22m8s