Open JakubReha opened 1 year ago
This is my TGN.yml. The segmentation fault happens at line 199 in train.py when calling "sampler.sample(root_nodes, ts)".
hello, do you work out this problem,I also get segment fault when using multi-hop recent sample, but multi-hop uniform sample works well
well,I found why this segment fault will happen
If you use 2 layer rencent sample and when you sample second layer, in the sample
function
void sample(std::vector<NodeIDType> &root_nodes, std::vector<TimeStampType> &root_ts)
{
// a weird bug, dgl library seems to modify the total number of threads
omp_set_num_threads(num_threads);
ret.resize(0);
bool first_layer = true;
bool use_ptr = false;
for (int i = 0; i < num_layers; i++)
{
ret.resize(ret.size() + num_history);
if ((first_layer) || ((prop_time) && num_history == 1) || (recent)) // here is why the bug occur, since we use recent sample, use_ptr will be true
{
first_layer = false;
use_ptr = true;
}
else
use_ptr = false;
if (i==0){
sample_layer(root_nodes, root_ts, num_neighbors[i], use_ptr, true);
}
else
{
sample_layer(root_nodes, root_ts, num_neighbors[i], use_ptr, false);
}
}
}
in sampler_layer
function, if use_ptr is true it will call update_ts_ptr, but since from_root is false, root_nodes is actually a nullptr, so it will crash in update_ts_ptr, I think the solution is initializing root_nodes and root_ts if from_root is false
void sample_layer(std::vector<NodeIDType> &_root_nodes, std::vector<TimeStampType> &_root_ts,
int neighs, bool use_ptr, bool from_root)
{
double t_s = omp_get_wtime();
std::vector<NodeIDType> *root_nodes;
std::vector<TimeStampType> *root_ts;
if (from_root)
{
root_nodes = &_root_nodes;
root_ts = &_root_ts;
}else { // add a else to initialize them
root_nodes = &(ret[ret.size() - 1 - num_history].nodes);
root_ts = &(ret[ret.size() - 1 - num_history].ts);
}
double t_ptr_s = omp_get_wtime();
if (use_ptr)
update_ts_ptr(num_history, *root_nodes, *root_ts, 0);
ret[0].ptr_time += omp_get_wtime() - t_ptr_s;
this solution works for me :)
I get the Segmentation fault (core dumped), when I try to extend TGN to multiple layers TGN.yml the same way TGAT is defined. It seems that having multiple layers together with the "recent" sampling strategy is not compatible. Are you not supporting this setting?