DMTCP-CRAC / CRAC-early-development

Other
22 stars 9 forks source link

The address space is not same about nvidia between checkpointing and restoring #8

Open GoodKairos opened 2 years ago

GoodKairos commented 2 years ago

Hi, I met an issue that the address returned from cudaMalloc is different, when I use CRAC to checkpointing and restore a cuda application. I read the address from the process of kernel-loader, I find the address about nvidia is different between checkpointing and restoring. Please refer to following log: (1) checkpointing: 200000000-200400000 ---p 00000000 00:00 0 200400000-200600000 rw-s 00000000 00:06 115718 /dev/nvidiactl 200600000-200800000 rw-s 00000000 00:06 90172 /dev/nvidia0 200800000-205000000 rw-s 00000000 00:06 115718 /dev/nvidiactl 205000000-206c00000 ---p 00000000 00:00 0 206c00000-206e00000 rw-s 00000000 00:06 115718 /dev/nvidiactl 206e00000-207000000 rw-s 00000000 00:06 115718 /dev/nvidiactl 207000000-207200000 rw-s 207000000 00:06 22681 /dev/nvidia-uvm 207200000-207400000 rw-s 00000000 00:06 115718 /dev/nvidiactl 207400000-207600000 ---p 00000000 00:00 0 207600000-207800000 rw-s 00000000 00:06 115718 /dev/nvidiactl 207800000-600200000 ---p 00000000 00:00 0 10000000000-10004000000 ---p 00000000 00:00 0 7ffece000000-7ffee5e00000 ---p 00000000 00:00 0 7ffee5e00000-7ffee6000000 rw-s 00000000 00:05 89331980 /dev/zero (deleted) 7ffee6000000-7fff00000000 ---p 00000000 00:00 0 7fff00000000-7fff00021000 rw-p 00000000 00:00 0 7fff00021000-7fff04000000 ---p 00000000 00:00 0 7fff04000000-7fff04021000 rw-p 00000000 00:00 0 7fff04021000-7fff08000000 ---p 00000000 00:00 0 7fff08000000-7fff10000000 rw-p 00000000 00:00 0 7fff10000000-7fff20000000 ---p 00000000 00:00 0

(2) restoring 200000000-200400000 ---p 00000000 00:00 0 200400000-200600000 rw-s 00000000 00:06 115718 /dev/nvidiactl 200600000-200800000 rw-s 00000000 00:06 90172 /dev/nvidia0 200800000-205000000 rw-s 00000000 00:06 115718 /dev/nvidiactl 205000000-206c00000 ---p 00000000 00:00 0 206c00000-206e00000 rw-s 00000000 00:06 115718 /dev/nvidiactl 206e00000-207000000 rw-s 00000000 00:06 115718 /dev/nvidiactl 207000000-207200000 rw-s 207000000 00:06 22681 /dev/nvidia-uvm 207200000-207400000 rw-s 00000000 00:06 115718 /dev/nvidiactl 207400000-207600000 ---p 00000000 00:00 0 207600000-207800000 rw-s 00000000 00:06 115718 /dev/nvidiactl 207800000-600200000 ---p 00000000 00:00 0 10000000000-10004000000 ---p 00000000 00:00 0 7ffeda000000-7fff00000000 ---p 00000000 00:00 0 7fff004b1000-7fff02000000 rw-p 00000000 00:00 0 7fff02000000-7fff04000000 ---p 00000000 00:00 0 7fff04000000-7fff04021000 rw-p 00000000 00:00 0 7fff04021000-7fff08000000 ---p 00000000 00:00 0 7fff08000000-7fff10000000 rw-p 00000000 00:00 0 7fff10033000-7fff12000000 rw-p 00000000 00:00 0 7fff12000000-7fff1c000000 ---p 00000000 00:00 0 7fff1c000000-7fff1c021000 rw-p 00000000 00:00 0 7fff1c021000-7fff20000000 ---p 00000000 00:00 0

I don't know why the base address is different.

Could you help me ? thank you very much.