Open lyxxn0414 opened 2 years ago
After debugging, I update rdma-core lib in all hosts and solve this problem. But node2-host failed when executing ibv_reg_mr, and the errno is 12(cannot allocate memory). I have checked max memory size and max locked memory are both unlimited. Could you help me solve this problem? Thanks!
By decreasing the size of memory registered by ibv_reg_mr, it can correctly run.
I edit the size in fs.c, line1692
for(int i=0; i<n_regions; i++) {
switch(i) {
case MR_NVM_LOG: {
//FIXME: share all log entries; current impl shares only a single log mr
mrs[i].type = MR_NVM_LOG;
mrs[i].addr = (uint64_t)g_bdev[g_log_dev]->map_base_addr;
mrs[i].length = ((disk_sb[g_log_dev].nlog) << g_block_size_shift);
break;
}
case MR_NVM_SHARED: {
mrs[i].type = MR_NVM_SHARED;
mrs[i].addr = (uint64_t)g_bdev[g_root_dev]->map_base_addr;
//mrs[i].length = dev_size[g_root_dev];
//mrs[i].length = (1UL << 30);
mrs[i].length = (sb[g_root_dev]->ondisk->size << g_block_size_shift); // All possible address of device. Ref) mkfs.c
//mrs[i].length = (disk_sb[g_root_dev].datablock_start - disk_sb[g_root_dev].inode_start << g_block_size_shift); // data blocks only.
break;
}
/*
case MR_DRAM_CACHE: {
mrs[i].type = MR_DRAM_CACHE;
mrs[i].addr = (uint64_t) g_fcache_base;
mrs[i].length = (g_max_read_cache_blocks << g_block_size_shift);
break;
}
*/
case MR_DRAM_BUFFER: {
// Check nic_slab_pool does not exceed 12GB. Note that the size of
// DRAM in SmartNIC is 16GB.
mlfs_assert((g_max_nicrpc_buf_blocks << g_block_size_shift)
<= (12ULL*1024*1024*1024));
// TODO need to adjust buf size.
nic_slab_init((g_max_nicrpc_buf_blocks << g_block_size_shift)/4);
mrs[i].length = (g_max_nicrpc_buf_blocks << g_block_size_shift)/4;
mrs[i].type = MR_DRAM_BUFFER;
mrs[i].addr = (uint64_t) nic_slab_pool->addr;
pr_dram_alloc("[DRAM_ALLOC] MR_DRAM_BUFFER size=%lu(MB)", mrs[i].length/1024/1024);
break;
}
default:
break;
}
pr_setup("mrs[%d] type %d, addr 0x%lx - 0x%lx, length %lu(%lu MB)",
i, mrs[i].type, mrs[i].addr, mrs[i].addr + mrs[i].length,
mrs[i].length, mrs[i].length / 1024 / 1024);
}
and the same position in nic_fs.c. It seems to correctly run. But when I run the iotest, I got "remote access error" when poll_cq. Seems that I cannot edit this way, could you help me solve this problem? Thanks a lot!
I'm starting linefs with 3 nodes, ip addresses of them are as below:
and SmartNIC of node2 has been successfully connected with its host. The terminal of node2-nic is like this:
But node1-host gets the wrong ip of node1-nic. The terminal of node1-host is like this:
Could you help me to solve this problem? Thanks a lot!