Closed GoogleCodeExporter closed 8 years ago
Indeed Brian, the hash formula will fail on any wrong input (unallocated memory
space), null input being one of them.
Null input is detectable, but most other wrong memory values are not. Should
the input pointer points towards a non-authorised memory value, the function
will crash the same way. Same story is also applicable if "length" is wrong and
larger than allocated space. In the end, it is the responsability of the user
to provide a correct input.
As a consequence, does it seem a good idea to add a detection mechanism
dedicated for a single "invalid input" situation, which is null input ?
Original comment by yann.col...@gmail.com
on 7 Apr 2013 at 10:24
Also : note that the value "0" is a valid Hash value. It should not be used as
an error signal, since valid Hash value "0" would be confused with an error.
Original comment by yann.col...@gmail.com
on 9 Apr 2013 at 9:48
I realize there are other errors that can occur, but in my opinion checking for
null input is a basic safeguard that is easy to do.
Regarding a hash value of "0", I don't dispute that. The changes I made should
not affect this.
Original comment by msc...@gmail.com
on 9 Apr 2013 at 4:54
OK, maybe there is a middle ground.
There are 2 branches of the same xxHash algorithm,
the one-step XXH32()
and the more flexible XXH32_*, with *= init, feed or result.
The first one (XXH32()) is a speed oriented version, especially targeted at
small inputs, where every single cycle matter. Additionnally, the function
leaves no room to express an error message, since all possible output values
are valid.
On the other hand, the function XXH32_feed() is already giving up a bit of
speed in favor of flexibility. A single additional comparison should not be
perceptible.
Moreover, the output of this function is an error status, so it's totally
possible to express a null-input error.
Therefore, i would suggest to add the null-input check to XXH32_feed().
The necessity to ensure that input memory must be valid will also be added in
the comments.
Original comment by yann.col...@gmail.com
on 10 Apr 2013 at 8:09
Also :
it's completely unrelated, but i've looked at your npm package, and i kinda
like the naming you selected for the methods.
create - update - digest
looks more suitable than the current
init - feed - result
Is there any other hash (or equivalent) package which use a similar naming
convention ?
Original comment by yann.col...@gmail.com
on 10 Apr 2013 at 12:25
Regarding the node module API, I based it off node's built-in crypto API for
hashing.
Original comment by msc...@gmail.com
on 10 Apr 2013 at 12:56
Looks like a fairly good reference.
I'm currently pondering the idea of modifying the C interface naming scheme.
Since your naming looks clearer, the C interface could benefit from it.
Original comment by yann.col...@gmail.com
on 10 Apr 2013 at 12:59
Here is a proposed release candidate
integrating the changes proposed here.
The null-pointer check is integrated into XXH32_update(),
and generates an XXH_ERROR when detected.
Original comment by yann.col...@gmail.com
on 10 Apr 2013 at 2:16
Attachments:
Partially solved within r7.
Null-pointer check added in update() function.
Comment requiring input validity added in header file.
Original comment by yann.col...@gmail.com
on 12 Apr 2013 at 8:27
I know this issue is "closed and fixed", but there is a trivial way to fix the
segfault in the XXH32 function without adding an additional null check. I don't
think it affects performance, since it's simply rearranging an existing
conditional.
Change:
while (p<=bEnd-4)
To:
while (bEnd-p>=4)
The problem is that bEnd is 0 and bEnd-4 is -4 (0xFFFFFFFC, after conversion
back into a 32-bit pointer), which is greater than p (which is 0).
Original comment by thomaska...@gmail.com
on 26 Apr 2013 at 3:03
It seems like a good idea.
However, that would mean that
the result of XXH32() over a null-pointer
would be the same as a zero size input.
Is such a result acceptable ?
Should it be the same with XXH32_update() ? (since currently, it outputs an
error code?)
Another potential catch is that (bEnd-p) may actually slow down performance.
This should be checked.
Original comment by yann.col...@gmail.com
on 26 Apr 2013 at 12:04
I changed the implementation because of problems I was having using C# interop
(for comparing with my pure managed implementation). What was happening was
that getting a native pointer for an empty managed byte array was returning a
null pointer, so it always passed a null value into it, even though my C# code
explicitly checks for a null array. So my only alternatives were to either fix
it in xxhash or use a dummy array with a single byte in C#.
Obviously it doesn't fix the issue with a null pointer and a non-zero length.
As for the performance, I suppose the compiler could optimize the bEnd-4 into a
runtime constant, since you marked bEnd as a const pointer value.
An alternate solution with a minimal performance impact would be to turn it
into an if-statement on the outside (using my check) with a do-while statement
inside it (using your check). You could even introduce another "limit" variable
inside the if statement too.
But realistically, the performance shouldn't be too much of an issue, since
that loop can only ever run 0 <= n < 4 times.
Original comment by thomaska...@gmail.com
on 26 Apr 2013 at 9:34
Shouldn't we also protect the main loop ? :
} while (p<=limit);
Original comment by yann.col...@gmail.com
on 1 May 2013 at 6:53
I don't think it matters in XXH32, as long as the length is correctly set to
zero for null inputs (because you check against the length in the precondition
for the 16-byte block loop).
However, XXH32_update will cause problems if the error-on-null check is removed
and would probably need to be refactored into:
if (bEnd-p>=16) { limit = bEnd-16; ... do { ... } while (p<=limit); ... }
If you wanted to be really pedantic, you could set length to zero if the input
is null or the length is negative (in both XXH32 and XXH32_update). Though it
would need to be done before bEnd is assigned, which breaks the rules for C89
(unless bEnd is put into another scope or do the check while bEnd is getting
assigned (ugly)).
P.S. If you use the simple implementation for XXH32 (the one that is #if'ed
out) with a null input, you should get the same result as an empty input.
Original comment by thomaska...@gmail.com
on 1 May 2013 at 10:56
OK,
my understanding is that XXH32_update() already takes care correctly of the
situation : it doesn't update anything is the input pointer is NULL, and
therefore acts the same as a null-length input.
Now for the main (fast) function XXH32().
I'm surprised that you suggest to only consider the specific corner case :
inputPointer = NULL
inputLength = 0
Is the scenarion where inputLength <> 0 so rare ? why ?
Original comment by yann.col...@gmail.com
on 9 May 2013 at 6:54
The following is a proposed solution to this issue.
It requires to enable a #define into xxhash.c, named :
XXH_CHECK_NULL_INPUT_POINTER
When it is enabled, a NULL input pointer triggers the same result as a
null-length input.
When it is disabled (default), a NULL input pointer will make the function
crash.
Would it fit the bill ?
Original comment by yann.col...@gmail.com
on 9 May 2013 at 7:14
Attachments:
Suggestion added into r28.
Original comment by yann.col...@gmail.com
on 11 May 2013 at 11:25
Original issue reported on code.google.com by
msc...@gmail.com
on 5 Apr 2013 at 11:41