SamJakob / xxh3

A Dart implementation (port) of the XXH3 hashing algorithm from xxHash.
https://pub.dev/packages/xxh3
MIT License
3 stars 1 forks source link

Produce signed int #3

Open Nico04 opened 10 months ago

Nico04 commented 10 months ago

I'm surprised that in my case it produces a negative int, where the docs says unsigned int. Is that normal ?

// Get the string as UTF-8 bytes.
final bytes = utf8.encode('https://tile.openstreetmap.org/16/32768/32768.png');

// Use XXH3 to hash the byte array (returns an int).
// XXH3 is a 64-bit hash, so the value is returned in the
// form of an unsigned 64-bit integer.
digest = xxh3(Uint8List.fromList(bytes));
value = digest.toString();
debugPrint(value);
value = digest.toRadixString(16);
debugPrint(value);

Output :

-7244486220034343490
-648995d58d5c0242
SamJakob commented 10 months ago

This is normal behavior. The value is to be interpreted as an unsigned 64 bit integer - so the binary representation of that value is the same as if it were unsigned, however Dart does not have a way of representing this, so the ordinary signed integer type is used.

See this StackOverflow answer: https://stackoverflow.com/a/53591126

Nico04 commented 10 months ago

Thanks for your lightning fast response :) OK I see, Indeed Dart int type is misleading here. I'm using this to build a determinist UUID for a filename. So I'm converting this int to a string at the end, possibly converting it to hex string. But it does keep the sign. Do you have any recommandation ?

Regarding this very issue, maybe you could just add an example how to handle that case to avoid noob question like mine ? Thanks

SamJakob commented 10 months ago

No problem! :)

Yes - it's not quite ideal, but I figured using the built-in int - given that it is 64-bit (on native platforms) would be most efficient. Though I might look at using something like fixnum which would guarantee an unsigned 64-bit number on native and web platforms. I doubt dart for web gives correct results at the moment as using int there actually converts to floating-point math after 2^53.

The main reason I didn't do that, is I didn't want to burden people with the extra conversion and dealing with a wrapper class. For most purposes, having the integer - signed or unsigned - is fine regardless (as you would just re-compute a hash value and compare most of the time) and even if you parsed the negative integer back into Dart, you'd get the same underlying binary representation.

For converting it to a string (as an unsigned) value, or similarly to hex, see the following example:

import 'dart:convert' show utf8;
import 'dart:typed_data';

import 'package:xxh3/xxh3.dart';

void main() {
  // Get the string as UTF-8 bytes.
  final bytes = utf8.encode("Hello, world!");

  // Use XXH3 to hash the byte array (returns an int).
  // XXH3 is a 64-bit hash, so the value is returned in the
  // form of an unsigned 64-bit integer. This is most efficient if you are
  // storing the value internally only as it just uses Dart's native int type.
  final int digest = xxh3(Uint8List.fromList(bytes));

  // You'll see that if we print this, we get a negative number.
  // -881777603154417559
  // This is because Dart's int type is signed, so when printing it, the value
  // is interpreted as a signed integer. However, the value's byte
  // representation is actually a 64-bit unsigned integer.
  print(digest);

  // If you need to work with the digest as a 64-bit unsigned integer,
  // use the BigInt class with toUnsigned(64). You'll see that converting this
  // to a string and printing it, gives us the unsigned interpretation of the
  // above value:
  // 17564966470555134057
  print(BigInt.from(digest).toUnsigned(64).toString());

  // If you want the hex representation of the digest, you can use the
  // toRadixString() method:
  // f3c34bf11915e869
  print(BigInt.from(digest).toUnsigned(64).toRadixString(16));
}

I might add some functions to do this automatically: xxh3Hex() or xxh3String()

Nico04 commented 9 months ago

Thanks again for you detailed answer ! I (finally) had the time to test that. Your last line is perfect for my application, thanks a lot !

Regargind this issue, sure a xxh3Hex() or xxh3String() method in the package would be perfect :)