Open oterocoronel opened 1 year ago
I don't know if it is related to issue #16
Thanks for sharing.
Re: concatenating taking a long time: In my case this step takes very little time, so there's definitely something strange going on here (probably because your files have 9x more frames than files I used for testing). Let me try to figure it out - I have some similar data from Coconut that I will test on.
I'm really confused about why saving a single file takes ~120 seconds, that's also very strange. It is a single numpy.save call, so it should complete in seconds. Are you saving to a local disk? Is it possible that there is other unrelated I/O going on at this time?
save_t = time.time()
log_cb("Saving fused, registered file of shape %s to %s" % (str(mov_save.shape), reg_data_path), 2)
n.save(reg_data_path, mov_save)
log_cb("Saved in %.2f sec" % (time.time() - save_t), 3)
This RAM growth issue is not something I encounter, but since it keeps popping back up I'll try to figure out the root cause of it. Can you give the specs of the system you are using, including the linux distro?
OK, I made some changes where I think memory might be leaking, and sped up the concatenation. I also added improved memory logging, and added a notebook with your data. FYI, with your Coconut data, I'm able to register 4930 frames in ~35 minutes (you can see in the notebook Demo-Coconut
). I don't see the same RAM leak in my case running this notebook (you can see the RAM clear after each of the 5 iterations), so please try to run this notebook and share the resulting plot from job.plot_memory_usage()
as well as the log.
Thanks! I pulled the new version of s3d. The concatenation step now takes 0 seconds, so that seems to be fixed. The saving step still takes ~120 secs per file, and the overall registration of the same files that you did in 35 mins took me ~2.5 hs. There are no other significant i/o processes. Ubuntu version: 22.04.2 LTS
Concatenating movie
Concat in 0.00 sec
After all GPU Batches:Total Used: 157.269 GB, Virtual Available: 096.512 GB, Virtual Used: 154.882 GB, Swap Used: 002.387 GB
Saving fused, registered file of shape (30, 100, 861, 855) to /home/freiwald/Data/analysis_2pRAM/Coconut/20230107d/Max30_500umdeep_1p75by1p75mm_3umppix_9p86Hz_250mW/s3d-Full-Demo-Coconut/registered_fused_data/fused_reg_data0000.npy
Saved in 126.47 sec
Saving fused, registered file of shape (30, 100, 861, 855) to /home/freiwald/Data/analysis_2pRAM/Coconut/20230107d/Max30_500umdeep_1p75by1p75mm_3umppix_9p86Hz_250mW/s3d-Full-Demo-Coconut/registered_fused_data/fused_reg_data0001.npy
Saved in 126.45 sec
Here is the full log:
System specs from sudo lshw -short
:
H/W path Device Class Description
=============================================================
system MS-7B94 (Default string)
/0 bus X299 PRO (MS-7B94)
/0/0 memory 64KiB BIOS
/0/39 memory 256GiB System Memory
/0/39/0 memory 32GiB DIMM DDR4 Synchronous 26
/0/39/1 memory 32GiB DIMM DDR4 Synchronous 26
/0/39/2 memory 32GiB DIMM DDR4 Synchronous 26
/0/39/3 memory 32GiB DIMM DDR4 Synchronous 26
/0/39/4 memory 32GiB DIMM DDR4 Synchronous 26
/0/39/5 memory 32GiB DIMM DDR4 Synchronous 26
/0/39/6 memory 32GiB DIMM DDR4 Synchronous 26
/0/39/7 memory 32GiB DIMM DDR4 Synchronous 26
/0/4b memory 640KiB L1 cache
/0/4c memory 10MiB L2 cache
/0/4d memory 19MiB L3 cache
/0/4e processor Intel(R) Core(TM) i9-10900X CP
/0/100 bridge Sky Lake-E DMI3 Registers
/0/100/4 generic Sky Lake-E CBDMA Registers
/0/100/4.1 generic Sky Lake-E CBDMA Registers
/0/100/4.2 generic Sky Lake-E CBDMA Registers
/0/100/4.3 generic Sky Lake-E CBDMA Registers
/0/100/4.4 generic Sky Lake-E CBDMA Registers
/0/100/4.5 generic Sky Lake-E CBDMA Registers
/0/100/4.6 generic Sky Lake-E CBDMA Registers
/0/100/4.7 generic Sky Lake-E CBDMA Registers
/0/100/5 generic Sky Lake-E MM/Vt-d Configurati
/0/100/5.2 generic Sky Lake-E RAS
/0/100/5.4 generic Sky Lake-E IOAPIC
/0/100/8 generic Sky Lake-E Ubox Registers
/0/100/8.1 generic Sky Lake-E Ubox Registers
/0/100/8.2 generic Sky Lake-E Ubox Registers
/0/100/14 bus 200 Series/Z370 Chipset Family
/0/100/14/0 usb1 bus xHCI Host Controller
/0/100/14/0/2 generic FT232R USB UART
/0/100/14/0/3 generic ADI Evaluation System
/0/100/14/0/9 generic CP2102 USB to UART Bridge Cont
/0/100/14/0/a generic PM100D
/0/100/14/0/c generic CP2102 USB to UART Bridge Cont
/0/100/14/0/d bus USB2.0 Hub
/0/100/14/0/e input MYSTIC LIGHT
/0/100/14/1 usb2 bus xHCI Host Controller
/0/100/14/1/1 generic Blackfly BFLY-U3-23S6C
/0/100/14.2 generic 200 Series PCH Thermal Subsyst
/0/100/16 communication 200 Series PCH CSME HECI #1
/0/100/17 storage 200 Series PCH SATA controller
/0/100/1b bridge 200 Series PCH PCI Express Roo
/0/100/1b.2 bridge 200 Series PCH PCI Express Roo
/0/100/1b.2/0 bus ASM2142 USB 3.1 Host Controlle
/0/100/1b.2/0/0 usb3 bus xHCI Host Controller
/0/100/1b.2/0/1 usb4 bus xHCI Host Controller
/0/100/1b.4 bridge 200 Series PCH PCI Express Roo
/0/100/1b.4/0 /dev/nvme0 storage Samsung SSD 970 EVO 1TB
/0/100/1b.4/0/0 hwmon0 disk NVMe disk
/0/100/1b.4/0/2 /dev/ng0n1 disk NVMe disk
/0/100/1b.4/0/1 /dev/nvme0n1 disk 1TB NVMe disk
/0/100/1b.4/0/1/1 /dev/nvme0n1p1 volume 238GiB Linux swap volume
/0/100/1b.4/0/1/2 /dev/nvme0n1p2 volume 15GiB EXT4 volume
/0/100/1b.4/0/1/3 /dev/nvme0n1p3 volume 677GiB EXT4 volume
/0/100/1c bridge 200 Series PCH PCI Express Roo
/0/100/1c.2 bridge 200 Series PCH PCI Express Roo
/0/100/1c.2/0 enp5s0 network RTL8125 2.5GbE Controller
/0/100/1c.4 bridge 200 Series PCH PCI Express Roo
/0/100/1c.4/0 bus ASM3242 USB 3.2 Host Controlle
/0/100/1c.4/0/0 usb5 bus xHCI Host Controller
/0/100/1c.4/0/1 usb6 bus xHCI Host Controller
/0/100/1d bridge 200 Series PCH PCI Express Roo
/0/100/1d/0 /dev/nvme1 storage Samsung SSD 970 EVO 500GB
/0/100/1d/0/0 hwmon1 disk NVMe disk
/0/100/1d/0/2 /dev/ng1n1 disk NVMe disk
/0/100/1d/0/1 /dev/nvme1n1 disk 500GB NVMe disk
/0/100/1d/0/1/1 /dev/nvme1n1p1 volume 449MiB Windows NTFS volume
/0/100/1d/0/1/2 /dev/nvme1n1p2 volume 98MiB Windows FAT volume
/0/100/1d/0/1/3 /dev/nvme1n1p3 volume 15MiB reserved partition
/0/100/1d/0/1/4 /dev/nvme1n1p4 volume 464GiB Windows NTFS volume
/0/100/1d/0/1/5 /dev/nvme1n1p5 volume 520MiB Windows NTFS volume
/0/100/1f bridge X299 Chipset LPC/eSPI Controll
/0/100/1f/0 system PnP device PNP0c02
/0/100/1f/1 system PnP device PNP0c02
/0/100/1f/2 system PnP device PNP0c02
/0/100/1f/3 system PnP device PNP0c02
/0/100/1f/4 system PnP device PNP0c02
/0/100/1f.2 memory Memory controller
/0/100/1f.3 card0 multimedia 200 Series PCH HD Audio
/0/100/1f.3/0 input10 input HDA Intel PCH Front Mic
/0/100/1f.3/1 input11 input HDA Intel PCH Rear Mic
/0/100/1f.3/2 input12 input HDA Intel PCH Line
/0/100/1f.3/3 input13 input HDA Intel PCH Line Out Front
/0/100/1f.3/4 input14 input HDA Intel PCH Line Out Surroun
/0/100/1f.3/5 input15 input HDA Intel PCH Line Out CLFE
/0/100/1f.3/6 input16 input HDA Intel PCH Front Headphone
/0/100/1f.4 bus 200 Series/Z370 Chipset Family
/0/100/1f.6 eno1 network Ethernet Connection (2) I219-V
/0/101 bridge Sky Lake-E PCI Express Root Po
/0/101/0 generic Chenming Mold Ind. Corp.
/0/1 generic Sky Lake-E VT-d
/0/3 generic Sky Lake-E RAS Configuration R
/0/4 generic Sky Lake-E IOxAPIC Configurati
/0/6 generic Sky Lake-E CHA Registers
/0/8.1 generic Sky Lake-E CHA Registers
/0/8.2 generic Sky Lake-E CHA Registers
/0/8.3 generic Sky Lake-E CHA Registers
/0/8.4 generic Sky Lake-E CHA Registers
/0/8.5 generic Sky Lake-E CHA Registers
/0/8.6 generic Sky Lake-E CHA Registers
/0/8.7 generic Sky Lake-E CHA Registers
/0/7 generic Sky Lake-E CHA Registers
/0/9.1 generic Sky Lake-E CHA Registers
/0/9.2 generic Sky Lake-E CHA Registers
/0/9.3 generic Sky Lake-E CHA Registers
/0/9.4 generic Sky Lake-E CHA Registers
/0/9.5 generic Sky Lake-E CHA Registers
/0/9.6 generic Sky Lake-E CHA Registers
/0/9.7 generic Sky Lake-E CHA Registers
/0/8 generic Sky Lake-E CHA Registers
/0/9 generic Sky Lake-E CHA Registers
/0/e generic Sky Lake-E CHA Registers
/0/e.1 generic Sky Lake-E CHA Registers
/0/e.2 generic Sky Lake-E CHA Registers
/0/e.3 generic Sky Lake-E CHA Registers
/0/e.4 generic Sky Lake-E CHA Registers
/0/e.5 generic Sky Lake-E CHA Registers
/0/e.6 generic Sky Lake-E CHA Registers
/0/e.7 generic Sky Lake-E CHA Registers
/0/f generic Sky Lake-E CHA Registers
/0/f.1 generic Sky Lake-E CHA Registers
/0/f.2 generic Sky Lake-E CHA Registers
/0/f.3 generic Sky Lake-E CHA Registers
/0/f.4 generic Sky Lake-E CHA Registers
/0/f.5 generic Sky Lake-E CHA Registers
/0/f.6 generic Sky Lake-E CHA Registers
/0/f.7 generic Sky Lake-E CHA Registers
/0/10 generic Sky Lake-E CHA Registers
/0/10.1 generic Sky Lake-E CHA Registers
/0/1d generic Sky Lake-E CHA Registers
/0/1d.1 generic Sky Lake-E CHA Registers
/0/1d.2 generic Sky Lake-E CHA Registers
/0/1d.3 generic Sky Lake-E CHA Registers
/0/1e generic Sky Lake-E PCU Registers
/0/1e.1 generic Sky Lake-E PCU Registers
/0/1e.2 generic Sky Lake-E PCU Registers
/0/1e.3 generic Sky Lake-E PCU Registers
/0/1e.4 generic Sky Lake-E PCU Registers
/0/1e.5 generic Sky Lake-E PCU Registers
/0/1e.6 generic Sky Lake-E PCU Registers
/0/102 bridge Sky Lake-E PCI Express Root Po
/0/102/0 display TU102 [GeForce RTX 2080 Ti Rev
/0/102/0.1 card1 multimedia TU102 High Definition Audio Co
/0/102/0.1/0 input3 input HDA NVidia HDMI/DP,pcm=3
/0/102/0.1/1 input4 input HDA NVidia HDMI/DP,pcm=7
/0/102/0.1/2 input5 input HDA NVidia HDMI/DP,pcm=8
/0/102/0.1/3 input6 input HDA NVidia HDMI/DP,pcm=9
/0/102/0.1/4 input7 input HDA NVidia HDMI/DP,pcm=10
/0/102/0.1/5 input8 input HDA NVidia HDMI/DP,pcm=11
/0/102/0.1/6 input9 input HDA NVidia HDMI/DP,pcm=12
/0/102/0.2 bus TU102 USB 3.1 Host Controller
/0/102/0.2/0 usb7 bus xHCI Host Controller
/0/102/0.2/1 usb8 bus xHCI Host Controller
/0/102/0.3 bus TU102 USB Type-C UCSI Controll
/0/a generic Sky Lake-E VT-d
/0/11 generic Sky Lake-E RAS Configuration R
/0/13 generic Sky Lake-E IOxAPIC Configurati
/0/14 generic Sky Lake-E Integrated Memory C
/0/18 generic Sky Lake-E Integrated Memory C
/0/19 generic Sky Lake-E Integrated Memory C
/0/1a generic Sky Lake-E Integrated Memory C
/0/a.2 generic Sky Lake-E Integrated Memory C
/0/a.3 generic Sky Lake-E Integrated Memory C
/0/a.4 generic Sky Lake-E Integrated Memory C
/0/a.5 generic Sky Lake-E LM Channel 1
/0/a.6 generic Sky Lake-E LMS Channel 1
/0/a.7 generic Sky Lake-E LMDP Channel 1
/0/b generic Sky Lake-E DECS Channel 2
/0/b.1 generic Sky Lake-E LM Channel 2
/0/b.2 generic Sky Lake-E LMS Channel 2
/0/b.3 generic Sky Lake-E LMDP Channel 2
/0/c generic Sky Lake-E Integrated Memory C
/0/c.1 generic Sky Lake-E Integrated Memory C
/0/c.2 generic Sky Lake-E Integrated Memory C
/0/c.3 generic Sky Lake-E Integrated Memory C
/0/c.4 generic Sky Lake-E Integrated Memory C
/0/c.5 generic Sky Lake-E LM Channel 1
/0/c.6 generic Sky Lake-E LMS Channel 1
/0/c.7 generic Sky Lake-E LMDP Channel 1
/0/d generic Sky Lake-E DECS Channel 2
/0/d.1 generic Sky Lake-E LM Channel 2
/0/d.2 generic Sky Lake-E LMS Channel 2
/0/d.3 generic Sky Lake-E LMDP Channel 2
/0/103 bridge Sky Lake-E PCI Express Root Po
/0/103/0 scsi0 storage MegaRAID SAS-3 3108 [Invader]
/0/103/0/2.0.0 /dev/sda disk 7999GB MR9361-8i
/0/103/0/2.0.0/1 /dev/sda1 volume 15MiB reserved partition
/0/103/0/2.0.0/2 /dev/sda2 volume 7449GiB Windows NTFS volume
/0/2 bridge Sky Lake-E PCI Express Root Po
/0/2/0 generic PXIe/PCIe Device
/0/5 generic Sky Lake-E VT-d
/0/5.2 generic Sky Lake-E RAS Configuration R
/0/5.4 generic Sky Lake-E IOxAPIC Configurati
/0/12 generic Sky Lake-E M3KTI Registers
/0/12.1 generic Sky Lake-E M3KTI Registers
/0/12.2 generic Sky Lake-E M3KTI Registers
/0/15 generic Sky Lake-E M2PCI Registers
/0/15.1 generic Sky Lake-E DDRIO Registers
/0/16 generic Sky Lake-E M2PCI Registers
/0/16.1 generic Sky Lake-E DDRIO Registers
/0/16.4 generic Sky Lake-E M2PCI Registers
/0/16.5 generic Sky Lake-E DDRIO Registers
/0/17 generic Sky Lake-E M2PCI Registers
/0/17.1 generic Sky Lake-E DDRIO Registers
/1 power To Be Filled By O.E.M.
/2 /dev/fb0 display EFI VGA
/3 input0 input Sleep Button
/4 input1 input Power Button
/5 input17 input MSI MYSTIC LIGHT
This is saving on an 8-TB RAID disk, I could try saving on one of the other disks if you think that some settings of this disk might be an issue... but the RAID disk might not be the fastest but it does not seem to be particularly slow:
sudo dd if=/dev/sda of=test.file bs=10G count=10 oflag=dsync
dd: warning: partial read (2147479552 bytes); suggest iflag=fullblock
0+10 records in
0+10 records out
21474795520 bytes (21 GB, 20 GiB) copied, 30.5937 s, 702 MB/s
Here are the RAM plots. It seems to be better now (not massively using Swap), but I would expect it to use less than 200GB if the files are ~20GB, right?:
OK, this seems to be working roughly as intended for now. Yes, 200 GB is too much, I will take a look at it to try to make it smaller. It's so big because the registered movie is larger than the original movie (because of the padding on both sides), and it is saved as a float32 (not int16) after registration, so the movie is probably around 50-60GB, and there are a few copies of it I guess. I should be able to reduce it. I'll see if converting back to int16 is OK.
About the I/O speed: seems like that's your limiting factor now. For me, I have a pair of M.2 SSDs arranged in RAID0 format, so it takes about ~20 second to save each file (I think this explains the timing difference between our runs). If you want to upgrade your workstation, it's not a bad idea to get these as temporary storage since there is a lot of i/o during processing.
I still think there's something weird going on when I am saving them. The disk speed was 700mb/s in my test, and for each 5GB file (100 frames) it still takes >70 seconds to save, so I believe it is saving almost 10x slower than it should
Also, it seems that some calculations are using float64. For example, this appears in the log when sending a batch to the GPU: Mov of shape 30, 10, 663, 628; 0.93 GB
. However if you run: np.zeros((30,10,663,628), dtype = np.float32).nbytes / 1024 / 1024 / 1024
, you get that this volume should be 0.465 GB, which is half of the value reported. So I would think that the volume is in float64. This might even be happening during initialization, since this is printed then:
Aligning planes
float64
20
Is the float64 necessary, or maybe just a consequence of how the volume was initialized?
Btw, the final frame size in my previous run was 861x855 pixels (due to a wrong lateral-offset estimation from the shallower planes) and it was taking ~120 secs to save 100 frames. Now that I used more frames to initialize and I got better lateral-offset values, frames are 663x628 and it takes ~70 secs to save 100 frames
The print statements are confusing, that is the size of the movie on the GPU while it is still in complex64 format. I am pretty sure it is reduced back to float32 when sent back to cpu, though I should double check. But yes, agreed with your other issue about the lateral offset estimation, when it fails for shallow planes it leads to way too much padding which makes everything slower. I’ll allow an option to update those values by the user.
Your issue with write speed might be something internal to numpy.save, the time reported there is for one line of code that calls np.save on an array in memory. I’m not sure why it doesn’t max out the disk write speed… maybe I should look into a different format for the storage.
After ~2 hours of registration, RAM usage keeps going up GPU RAM seems to be working okay The concatenating step took ~4700 seconds (each file is 100 secs at 9Hz, so 900 frames, ~900x700 pixels), I don't know if this is expected
After some more time, it starting using swap:
Here is a fragment of the log, but I am also attaching the whole log file
log.txt