Open bobye opened 8 years ago
https://github.com/bobye/d2_kmeans/blob/master/src/d2/clustering_io.c#L52
Start from line 52, we need to read header information of data. This part is read-only, so one can share between other processes within the same node.
Remark that based on the nature of data, different header data are read (either p_data->ph[n].dist_mat
or p_data->ph[n].vocab_vec
)
@robbwu
My understanding is that all processes need to read the same file into memory, and you want to share the memory among the processes. The usual way to do inter-process memory sharing is:
In this way, only one copy of the memory object is in the physical memory; all processes will access the same physical memory.
basically, for the memory you want to share, use mmap instead of malloc.
How is the behavior that if one process from another node wants to read this memory object. It there any cross node communication? I assume no communication is needed to read data from this memory object.
@robbwu
First thing is for each node, we need to select a MPI rank to do the shm_open, read the file and write to the memory object. We can do this by creating a communicator for each node and use local rank 0 process to do the above work. (http://www.open-mpi.org/doc/v1.8/man3/MPI_Comm_split_type.3.php)
// newcomm is the communicator for each node.
MPI_Comm_split_type(MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED , rank, info, *newcomm);
MPI_Comm_rank(newcomm, &local_rank);
if(local_rank==0) {// only one process in each node does the following
// create a named memory object
fd = shm_open("/mydata", O_RDWR | O_CREAT, S_IRUSR | S_IWUSR)
ftruncate(fd, <the size of shared memory space>);
rptr = mmap(NULL, <the size of shared memory space>, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
// now read the file into rptr
MPI_Barrier( newcomm );
} else {
MPI_Barrier( newcomm );
fd = shm_open("/mydata", O_RDONLY);
rptr = mmap(NULL, <the size of shared memory space>, PROT_READ, MAP_SHARED, fd, 0);
}
after that, the pointer rptr points to the shared memory space. Also remember to shm_unlink() and munmap after use.
Looks good. If data race is allowed, we can still use mpi2, I guess. Any negative effects?
One more question:
should /mydata
be different for different nodes?
@robbwu
/mydata could be the same for each node. The namespace is confined to a single node so no interference is possible and it makes programming easier. Think of it like a local file.
MPI-3 adds MPI_Comm_split_type function. You can make the "share memory" feature conditional on MPI-3. If user does not have MPI-3 then they don't have shared memory.
Yep, that's my plan.
@robbwu
mpi能不能做intra-node的内存优化,我有一块比较大的只读区域,能不能让shared memory的process从同一个地方读?