Open guillermo-carrasco opened 10 years ago
We should add assert total_reads != 0
. No dataset contains 0 reads ATM.
I have went through query.c
but could not figure out what's wrong, until I compared the actual bloom files on each system...
Sizes of bloom filters on mac and x86 respectively:
-rw-r--r-- 1 roman staff 217529788 28 Okt 2013 /Users/roman/dev/facs/tests/data/bloom/dm3.bloom
-rw-r----- 1 roman roman 217529788 May 18 2013 tests/data/bloom/dm3.bloom
Contents vary though:
MD5 (/Users/roman/dev/facs/tests/data/bloom/dm3.bloom) = 2a3d92277a675516c5d9efc470b84862
1d50e7f9e1170b6bff2d99f96a585b40 tests/data/bloom/dm3.bloom
So now I would put my bets on something going on wrong while building the bloom filter on OSX...
A quick hexdump inspection reveals the problem:
On OSX:
0000000 45056 06099 00001 00000 40860 03447 00001 00000
0000010 65194 01875 00000 00000 60061 26553 00000 00000
0000020 00007 00000 00001 00000 13862 02626 00000 00000
On x86:
0000000 53264 17105 10952 00000 12384 14347 10952 00000
0000010 65194 01875 00000 00000 60061 26553 00000 00000
0000020 00007 00000 00000 00000 13862 02626 00000 00000
Therefore, header construction for .bloom
files on OSX is wrong. Let's "hexamine" build.c
then... :)
Sorry for the strange decimal output, it might make it hard to compare in hexa. Here's the canonical output from hexdump (hexdump -n100 -C dm3.bloom
):
On OSX:
00000000 00 b0 d3 17 01 00 00 00 9c 9f 77 0d 01 00 00 00 |..........w.....|
00000010 aa fe 53 07 00 00 00 00 9d ea b9 67 00 00 00 00 |..S........g....|
00000020 07 00 00 00 01 00 00 00 26 36 42 0a 00 00 00 00 |........&6B.....|
00000030 7b 14 ae 47 e1 7a 74 3f 13 00 00 00 ab 00 00 00 |{..G.zt?........|
00000040 00 00 00 00 00 00 00 00 94 b0 a0 28 6a 3a 0b 80 |...........(j:..|
00000050 c4 05 f4 00 c9 10 a8 0d 96 52 74 4d 52 60 e1 4c |.........RtMR`.L|
00000060 42 aa 61 02 |B.a.|
00000064
On x86:
00000000 10 d0 d1 42 c8 2a 00 00 60 30 0b 38 c8 2a 00 00 |...B.*..`0.8.*..|
00000010 aa fe 53 07 00 00 00 00 9d ea b9 67 00 00 00 00 |..S........g....|
00000020 07 00 00 00 00 00 00 00 26 36 42 0a 00 00 00 00 |........&6B.....|
00000030 7b 14 ae 47 e1 7a 74 3f 13 00 00 00 ab 00 00 00 |{..G.zt?........|
00000040 50 00 00 00 00 00 00 00 94 b0 a0 28 6a 3a 0b 80 |P..........(j:..|
00000050 c4 05 f4 00 c9 10 a8 0d 96 52 74 4d 52 60 e1 4c |.........RtMR`.L|
00000060 42 aa 61 02 |B.a.|
00000064
Since those bloom filters were generated with the same testsuite, the parameters are the same, so there should be some bug on how OSX compiles/interprets numbers when building the filter... or a bad pointer when it reports the results...
Bindiffing both bloom files with radiff2
from @radare and running the test against the "correct" bloom filter returns same nan
s. Therefore, there could be something wrong with the reporting function...
Something must be wrong either with the bloom filter construction or with the query method in Mac OsX, as it is returning
0
andnan
on every query: