JIABI / xxhash

Automatically exported from code.google.com/p/xxhash
Other
0 stars 0 forks source link

Segfault on null input #6

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
I've fixed this temporarily in the node.js xxhash binding: 
https://github.com/mscdex/node-xxhash/commit/357ce0c

Original issue reported on code.google.com by msc...@gmail.com on 5 Apr 2013 at 11:41

GoogleCodeExporter commented 8 years ago
Indeed Brian, the hash formula will fail on any wrong input (unallocated memory 
space), null input being one of them.

Null input is detectable, but most other wrong memory values are not. Should 
the input pointer points towards a non-authorised memory value, the function 
will crash the same way. Same story is also applicable if "length" is wrong and 
larger than allocated space. In the end, it is the responsability of the user 
to provide a correct input.

As a consequence, does it seem a good idea to add a detection mechanism 
dedicated for a single "invalid input" situation, which is null input ?

Original comment by yann.col...@gmail.com on 7 Apr 2013 at 10:24

GoogleCodeExporter commented 8 years ago
Also : note that the value "0" is a valid Hash value. It should not be used as 
an error signal, since valid Hash value "0" would be confused with an error.

Original comment by yann.col...@gmail.com on 9 Apr 2013 at 9:48

GoogleCodeExporter commented 8 years ago
I realize there are other errors that can occur, but in my opinion checking for 
null input is a basic safeguard that is easy to do.

Regarding a hash value of "0", I don't dispute that. The changes I made should 
not affect this.

Original comment by msc...@gmail.com on 9 Apr 2013 at 4:54

GoogleCodeExporter commented 8 years ago
OK, maybe there is a middle ground.

There are 2 branches of the same xxHash algorithm,
the one-step XXH32()
and the more flexible XXH32_*, with *= init, feed or result.

The first one (XXH32()) is a speed oriented version, especially targeted at 
small inputs, where every single cycle matter. Additionnally, the function 
leaves no room to express an error message, since all possible output values 
are valid.

On the other hand, the function XXH32_feed() is already giving up a bit of 
speed in favor of flexibility. A single additional comparison should not be 
perceptible.
Moreover, the output of this function is an error status, so it's totally 
possible to express a null-input error.

Therefore, i would suggest to add the null-input check to XXH32_feed().
The necessity to ensure that input memory must be valid will also be added in 
the comments.

Original comment by yann.col...@gmail.com on 10 Apr 2013 at 8:09

GoogleCodeExporter commented 8 years ago
Also :
it's completely unrelated, but i've looked at your npm package, and i kinda 
like the naming you selected for the methods.
create - update - digest
looks more suitable than the current
init - feed - result

Is there any other hash (or equivalent) package which use a similar naming 
convention ?

Original comment by yann.col...@gmail.com on 10 Apr 2013 at 12:25

GoogleCodeExporter commented 8 years ago
Regarding the node module API, I based it off node's built-in crypto API for 
hashing.

Original comment by msc...@gmail.com on 10 Apr 2013 at 12:56

GoogleCodeExporter commented 8 years ago
Looks like a fairly good reference.

I'm currently pondering the idea of modifying the C interface naming scheme.
Since your naming looks clearer, the C interface could benefit from it.

Original comment by yann.col...@gmail.com on 10 Apr 2013 at 12:59

GoogleCodeExporter commented 8 years ago
Here is a proposed release candidate
integrating the changes proposed here.

The null-pointer check is integrated into XXH32_update(),
and generates an XXH_ERROR when detected.

Original comment by yann.col...@gmail.com on 10 Apr 2013 at 2:16

Attachments:

GoogleCodeExporter commented 8 years ago
Partially solved within r7.
Null-pointer check added in update() function.
Comment requiring input validity added in header file.

Original comment by yann.col...@gmail.com on 12 Apr 2013 at 8:27

GoogleCodeExporter commented 8 years ago
I know this issue is "closed and fixed", but there is a trivial way to fix the 
segfault in the XXH32 function without adding an additional null check. I don't 
think it affects performance, since it's simply rearranging an existing 
conditional.

Change:
    while (p<=bEnd-4)

To:
    while (bEnd-p>=4)

The problem is that bEnd is 0 and bEnd-4 is -4 (0xFFFFFFFC, after conversion 
back into a 32-bit pointer), which is greater than p (which is 0).

Original comment by thomaska...@gmail.com on 26 Apr 2013 at 3:03

GoogleCodeExporter commented 8 years ago
It seems like a good idea.

However, that would mean that 
the result of XXH32() over a null-pointer 
would be the same as a zero size input.

Is such a result acceptable ?
Should it be the same with XXH32_update() ? (since currently, it outputs an 
error code?)

Another potential catch is that (bEnd-p) may actually slow down performance. 
This should be checked.

Original comment by yann.col...@gmail.com on 26 Apr 2013 at 12:04

GoogleCodeExporter commented 8 years ago
I changed the implementation because of problems I was having using C# interop 
(for comparing with my pure managed implementation). What was happening was 
that getting a native pointer for an empty managed byte array was returning a 
null pointer, so it always passed a null value into it, even though my C# code 
explicitly checks for a null array. So my only alternatives were to either fix 
it in xxhash or use a dummy array with a single byte in C#.

Obviously it doesn't fix the issue with a null pointer and a non-zero length.

As for the performance, I suppose the compiler could optimize the bEnd-4 into a 
runtime constant, since you marked bEnd as a const pointer value.

An alternate solution with a minimal performance impact would be to turn it 
into an if-statement on the outside (using my check) with a do-while statement 
inside it (using your check). You could even introduce another "limit" variable 
inside the if statement too.

But realistically, the performance shouldn't be too much of an issue, since 
that loop can only ever run 0 <= n < 4 times.

Original comment by thomaska...@gmail.com on 26 Apr 2013 at 9:34

GoogleCodeExporter commented 8 years ago
Shouldn't we also protect the main loop ? :
} while (p<=limit);

Original comment by yann.col...@gmail.com on 1 May 2013 at 6:53

GoogleCodeExporter commented 8 years ago
I don't think it matters in XXH32, as long as the length is correctly set to 
zero for null inputs (because you check against the length in the precondition 
for the 16-byte block loop).

However, XXH32_update will cause problems if the error-on-null check is removed 
and would probably need to be refactored into:

if (bEnd-p>=16) { limit = bEnd-16; ... do { ... } while (p<=limit); ... }

If you wanted to be really pedantic, you could set length to zero if the input 
is null or the length is negative (in both XXH32 and XXH32_update). Though it 
would need to be done before bEnd is assigned, which breaks the rules for C89 
(unless bEnd is put into another scope or do the check while bEnd is getting 
assigned (ugly)).

P.S. If you use the simple implementation for XXH32 (the one that is #if'ed 
out) with a null input, you should get the same result as an empty input.

Original comment by thomaska...@gmail.com on 1 May 2013 at 10:56

GoogleCodeExporter commented 8 years ago
OK,
my understanding is that XXH32_update() already takes care correctly of the 
situation : it doesn't update anything is the input pointer is NULL, and 
therefore acts the same as a null-length input.

Now for the main (fast) function XXH32().

I'm surprised that you suggest to only consider the specific corner case :
inputPointer = NULL
inputLength = 0

Is the scenarion where inputLength <> 0 so rare ? why ?

Original comment by yann.col...@gmail.com on 9 May 2013 at 6:54

GoogleCodeExporter commented 8 years ago
The following is a proposed solution to this issue.

It requires to enable a #define into xxhash.c, named : 
XXH_CHECK_NULL_INPUT_POINTER

When it is enabled, a NULL input pointer triggers the same result as a 
null-length input.

When it is disabled (default), a NULL input pointer will make the function 
crash.

Would it fit the bill ?

Original comment by yann.col...@gmail.com on 9 May 2013 at 7:14

Attachments:

GoogleCodeExporter commented 8 years ago
Suggestion added into r28.

Original comment by yann.col...@gmail.com on 11 May 2013 at 11:25