Closed zhczhong closed 3 years ago
Oh, it might be caused by memory issues? Have you ever tried the newest version of our codes?
Besides, would you mind please running the codes using Debug Mode, and put the coredump information if possible?
Hi, Jiacheng
Thanks for your quick reply. I confirm that I use the latest code(BTW, I use gcc(version 8.3.0) to compile your code). And the debug information in gdb is as below.
Program received signal SIGSEGV, Segmentation fault.
0x00000000004048d1 in LIPP<unsigned long, unsigned long, true>::build_tree_two (value2=5170311421649302915, key2=5170311421649302915, value1=5170311421649302705,
key1=5170311421649302705, this=0x7fffffff55a0) at code/lipp/src/core/lipp.h:409
--Type <RET> for more, q to quit, c to continue without paging--
409 RT_ASSERT(BITMAP_GET(node->none_bitmap, pos) == 1);
(gdb) backtrace
#0 0x00000000004048d1 in LIPP<unsigned long, unsigned long, true>::build_tree_two (value2=5170311421649302915, key2=5170311421649302915,
value1=5170311421649302705, key1=5170311421649302705, this=0x7fffffff55a0) at code/lipp/src/core/lipp.h:409
#1 LIPP<unsigned long, unsigned long, true>::build_tree_bulk_fmcd (this=this@entry=0x7fffffff55a0, _keys=_keys@entry=0x7fff97ac1010,
_values=_values@entry=0x7fff67fd0010, _size=_size@entry=100000000) at code/lipp/src/core/lipp.h:565
#2 0x0000000000405ecd in LIPP<unsigned long, unsigned long, true>::build_tree_bulk (_size=100000000, _values=0x7fff67fd0010, _keys=0x7fff97ac1010,
this=0x7fffffff55a0) at code/lipp/src/core/lipp.h:425
#3 LIPP<unsigned long, unsigned long, true>::bulk_load (this=0x7fffffff55a0, vs=0x7ffec95ab010, num_keys=100000000)
at code/lipp/src/core/lipp.h:175
#4 0x0000000000402ca5 in main () at code/lipp/src/examples/example_bulk_load.cpp:38
Thanks, I also have reproduced this case on our computers.
Actually, in the provided datasets with larger keys, the parameters of models become NaN, which could be caused by the zero-division. This issue might be resulted from the precision of double types, especially on the dataset where keys are too close after converting to double.
Besides, we have encountered similar situations at ALEX previously, but the first time at LIPP. But I think we have the experience to fix this issue.
We plan to fix this issue in one week, so please be patient. You could star or watch this repo to keep up with our codes.
Sincerely
Sorry to close this issue accidentally.
No problem. Looking forward to the update!
May I ask when is the issue estimated fix time? Thanks in advance!
Yeah, we have already located the code causing this issue, and we still need a week to fix and test. In fact, we have a lot of other things to do.
Him @zhczhong , we have temporarily fixed this issue in our "develop" branch. To solve the precision problem, we introduce a determined bias on key in each node, which would somehow hurt the performance slightly.
Hi Jiacheng,
Thanks for your quick response and effort. But when I use the code in develop branch to load 200M keys, it will trigger the assertion. The code below could be used to reproduce it.
#include <lipp.h>
#include <iostream>
#include <vector>
#include <cassert>
#include <algorithm>
#include <iostream>
#include <fstream>
using namespace std;
template<class T>
bool load_binary_data(T data[], int length, const std::string &file_path) {
std::ifstream is(file_path.c_str(), std::ios::binary | std::ios::in);
if (!is.is_open()) {
return false;
}
is.read(reinterpret_cast<char *>(data), std::streamsize(length * sizeof(T)));
is.close();
return true;
}
int main()
{
LIPP<uint64_t, uint64_t> lipp;
size_t data_size = 200000000;
// prepare data
uint64_t *keys = new uint64_t[data_size];
load_binary_data(keys, data_size, "./osm_cellids_200M_uint64");
vector<pair<uint64_t, uint64_t>> data;
std::sort(keys, keys + data_size);
for (size_t i = 0; i < data_size; i ++) {
data.push_back({keys[i], keys[i]});
}
// bulk load
lipp.bulk_load(data.data(), data.size());
// verify LIPP data structure
lipp.verify();
cout << "bolk load success" << endl;
return 0;
}
The output is
RT_ASSERT Error at ./code/lipp/src/core/lipp.h:675, `next - offset <= (size+2) / 3`
Oh, we forget to remove this assertion.
We have updated the develop branch and you could try again.
Thanks, the problem is solved! But the current performance seems to be a little worse than alex.
Probably, different hardware configurations (CPU frequency, memory bandwidth, or instruction set) will influence the overall performance. Besides, different workloads and datasets also have an impact on the index structure and thus ruin the performance.
Hi,
There will appear a segmentation fault when bulk loading osm_cellids_200M_uint64 dataset in SOSD. The code below could be used to reproduce the error. And the dataset could be fetching by executing the script from https://github.com/learnedsystems/SOSD/blob/master/scripts/download.sh.
Output