alihaydaroglu / suite3d

Fast, accurate, volumetric cell detection. Developed for Light Beads Microscopy, usable for other volumetric 2P. In development
6 stars 0 forks source link

RAM management in GPU registration #39

Open oterocoronel opened 1 year ago

oterocoronel commented 1 year ago

After ~2 hours of registration, RAM usage keeps going up GPU RAM seems to be working okay The concatenating step took ~4700 seconds (each file is 100 secs at 9Hz, so 900 frames, ~900x700 pixels), I don't know if this is expected

image

After some more time, it starting using swap:

image

Here is a fragment of the log, but I am also attaching the whole log file

log.txt


[2023-09-27 21:29:19][04]             GPU RAM: 9 blocks allocated, 1.56 / 2.21 GB used
[2023-09-27 21:29:19][01]    Clipped movie in 0.16 sec
[2023-09-27 21:29:19][01]    Split movie into blocks in 0.00 sec
[2023-09-27 21:29:19][01]    Completed FFT of blocks and computed phase correlations in 0.01 sec
[2023-09-27 21:29:19][04]             Iter 0: 2199/2970 blocks below SNR thresh
[2023-09-27 21:29:19][04]             Iter 1: 1483/2970 blocks below SNR thresh
[2023-09-27 21:29:19][04]             Iter 1: 1264/2970 blocks below SNR thresh
[2023-09-27 21:29:19][01]    Computed SNR and smoothed phase corrs in 0.03 sec
[2023-09-27 21:29:19][03]          Computed subpixel shifts in 0.00 sec
[2023-09-27 21:29:19][04]             GPU RAM: 8 blocks allocated, 1.57 / 1.72 GB used
[2023-09-27 21:29:19][02]       Computed non-rigid shifts in 1.32 sec
[2023-09-27 21:29:19][02]       Transferred to CPU in 0.02 sec
[2023-09-27 21:29:19][01]    Non rigid transformed (on CPU) in 0.66 sec
[2023-09-27 21:29:20][02]       Concatenating movie
[2023-09-27 22:48:27][03]          Concat in 4746.98 sec
[2023-09-27 22:48:27][02]       Saving fused, registered file of shape (30, 100, 938, 766) to /home/freiwald/Data/analysis_2pRAM/Coconut/20230107d/Max30_500umdeep_1p75by1p75mm_3umppix_9p86Hz_250mW/s3d-Full-Demo/registered_fused_data/fused_reg_data0000.npy
[2023-09-27 22:50:25][03]          Saved in 118.27 sec
[2023-09-27 22:50:25][02]       Saving fused, registered file of shape (30, 100, 938, 766) to /home/freiwald/Data/analysis_2pRAM/Coconut/20230107d/Max30_500umdeep_1p75by1p75mm_3umppix_9p86Hz_250mW/s3d-Full-Demo/registered_fused_data/fused_reg_data0001.npy
[2023-09-27 22:52:25][03]          Saved in 120.48 sec
[2023-09-27 22:52:25][02]       Saving fused, registered file of shape (30, 100, 938, 766) to /home/freiwald/Data/analysis_2pRAM/Coconut/20230107d/Max30_500umdeep_1p75by1p75mm_3umppix_9p86Hz_250mW/s3d-Full-Demo/registered_fused_data/fused_reg_data0002.npy
[2023-09-27 22:54:22][03]          Saved in 116.86 sec
[2023-09-27 22:54:22][02]       Saving fused, registered file of shape (30, 100, 938, 766) to /home/freiwald/Data/analysis_2pRAM/Coconut/20230107d/Max30_500umdeep_1p75by1p75mm_3umppix_9p86Hz_250mW/s3d-Full-Demo/registered_fused_data/fused_reg_data0003.npy
[2023-09-27 22:56:21][03]          Saved in 119.29 sec
[2023-09-27 22:56:21][02]       Saving fused, registered file of shape (30, 100, 938, 766) to /home/freiwald/Data/analysis_2pRAM/Coconut/20230107d/Max30_500umdeep_1p75by1p75mm_3umppix_9p86Hz_250mW/s3d-Full-Demo/registered_fused_data/fused_reg_data0004.npy
[2023-09-27 22:58:24][03]          Saved in 122.56 sec
[2023-09-27 22:58:24][02]       Saving fused, registered file of shape (30, 100, 938, 766) to /home/freiwald/Data/analysis_2pRAM/Coconut/20230107d/Max30_500umdeep_1p75by1p75mm_3umppix_9p86Hz_250mW/s3d-Full-Demo/registered_fused_data/fused_reg_data0005.npy
[2023-09-27 23:00:32][03]          Saved in 128.32 sec
[2023-09-27 23:00:32][02]       Saving fused, registered file of shape (30, 100, 938, 766) to /home/freiwald/Data/analysis_2pRAM/Coconut/20230107d/Max30_500umdeep_1p75by1p75mm_3umppix_9p86Hz_250mW/s3d-Full-Demo/registered_fused_data/fused_reg_data0006.npy
[2023-09-27 23:02:41][03]          Saved in 128.86 sec
[2023-09-27 23:02:41][02]       Saving fused, registered file of shape (30, 100, 938, 766) to /home/freiwald/Data/analysis_2pRAM/Coconut/20230107d/Max30_500umdeep_1p75by1p75mm_3umppix_9p86Hz_250mW/s3d-Full-Demo/registered_fused_data/fused_reg_data0007.npy
[2023-09-27 23:04:50][03]          Saved in 128.50 sec
[2023-09-27 23:04:50][02]       Saving fused, registered file of shape (30, 100, 938, 766) to /home/freiwald/Data/analysis_2pRAM/Coconut/20230107d/Max30_500umdeep_1p75by1p75mm_3umppix_9p86Hz_250mW/s3d-Full-Demo/registered_fused_data/fused_reg_data0008.npy
[2023-09-27 23:06:58][03]          Saved in 128.75 sec
[2023-09-27 23:06:58][02]       Saving fused, registered file of shape (30, 86, 938, 766) to /home/freiwald/Data/analysis_2pRAM/Coconut/20230107d/Max30_500umdeep_1p75by1p75mm_3umppix_9p86Hz_250mW/s3d-Full-Demo/registered_fused_data/fused_reg_data0009.npy
[2023-09-27 23:08:48][03]          Saved in 109.88 sec
[2023-09-27 23:08:48][03]          Memory at batch 1.  Total Used: 135.854 GB, Virtual Available: 118.634 GB, Virtual Used: 132.759 GB, Swap Used: 003.095 GB
[2023-09-27 23:08:48][00] Loading Batch 1 of 36
[2023-09-27 23:08:48][01]    Batch 1 IO thread joined
[2023-09-27 23:08:48][03]          Memory after IO thread joinTotal Used: 135.854 GB, Virtual Available: 118.635 GB, Virtual Used: 132.759 GB, Swap Used: 003.095 GB
[2023-09-27 23:08:58][01]    Launching IO thread for next batch
[2023-09-27 23:08:58][05]                [Thread] Loading batch 2 
oterocoronel commented 1 year ago

I don't know if it is related to issue #16

alihaydaroglu commented 1 year ago

Thanks for sharing.

Re: concatenating taking a long time: In my case this step takes very little time, so there's definitely something strange going on here (probably because your files have 9x more frames than files I used for testing). Let me try to figure it out - I have some similar data from Coconut that I will test on.

I'm really confused about why saving a single file takes ~120 seconds, that's also very strange. It is a single numpy.save call, so it should complete in seconds. Are you saving to a local disk? Is it possible that there is other unrelated I/O going on at this time?

save_t = time.time()
log_cb("Saving fused, registered file of shape %s to %s" % (str(mov_save.shape), reg_data_path), 2)
n.save(reg_data_path, mov_save)
log_cb("Saved in %.2f sec" % (time.time() - save_t), 3)

This RAM growth issue is not something I encounter, but since it keeps popping back up I'll try to figure out the root cause of it. Can you give the specs of the system you are using, including the linux distro?

alihaydaroglu commented 1 year ago

OK, I made some changes where I think memory might be leaking, and sped up the concatenation. I also added improved memory logging, and added a notebook with your data. FYI, with your Coconut data, I'm able to register 4930 frames in ~35 minutes (you can see in the notebook Demo-Coconut). I don't see the same RAM leak in my case running this notebook (you can see the RAM clear after each of the 5 iterations), so please try to run this notebook and share the resulting plot from job.plot_memory_usage() as well as the log. image

oterocoronel commented 1 year ago

Thanks! I pulled the new version of s3d. The concatenation step now takes 0 seconds, so that seems to be fixed. The saving step still takes ~120 secs per file, and the overall registration of the same files that you did in 35 mins took me ~2.5 hs. There are no other significant i/o processes. Ubuntu version: 22.04.2 LTS

Concatenating movie
         Concat in 0.00 sec
         After all GPU Batches:Total Used: 157.269 GB, Virtual Available: 096.512 GB, Virtual Used: 154.882 GB, Swap Used: 002.387 GB
      Saving fused, registered file of shape (30, 100, 861, 855) to /home/freiwald/Data/analysis_2pRAM/Coconut/20230107d/Max30_500umdeep_1p75by1p75mm_3umppix_9p86Hz_250mW/s3d-Full-Demo-Coconut/registered_fused_data/fused_reg_data0000.npy
         Saved in 126.47 sec
      Saving fused, registered file of shape (30, 100, 861, 855) to /home/freiwald/Data/analysis_2pRAM/Coconut/20230107d/Max30_500umdeep_1p75by1p75mm_3umppix_9p86Hz_250mW/s3d-Full-Demo-Coconut/registered_fused_data/fused_reg_data0001.npy
         Saved in 126.45 sec

Here is the full log:

log.txt

System specs from sudo lshw -short :

H/W path           Device          Class          Description
=============================================================
                                   system         MS-7B94 (Default string)
/0                                 bus            X299 PRO (MS-7B94)
/0/0                               memory         64KiB BIOS
/0/39                              memory         256GiB System Memory
/0/39/0                            memory         32GiB DIMM DDR4 Synchronous 26
/0/39/1                            memory         32GiB DIMM DDR4 Synchronous 26
/0/39/2                            memory         32GiB DIMM DDR4 Synchronous 26
/0/39/3                            memory         32GiB DIMM DDR4 Synchronous 26
/0/39/4                            memory         32GiB DIMM DDR4 Synchronous 26
/0/39/5                            memory         32GiB DIMM DDR4 Synchronous 26
/0/39/6                            memory         32GiB DIMM DDR4 Synchronous 26
/0/39/7                            memory         32GiB DIMM DDR4 Synchronous 26
/0/4b                              memory         640KiB L1 cache
/0/4c                              memory         10MiB L2 cache
/0/4d                              memory         19MiB L3 cache
/0/4e                              processor      Intel(R) Core(TM) i9-10900X CP
/0/100                             bridge         Sky Lake-E DMI3 Registers
/0/100/4                           generic        Sky Lake-E CBDMA Registers
/0/100/4.1                         generic        Sky Lake-E CBDMA Registers
/0/100/4.2                         generic        Sky Lake-E CBDMA Registers
/0/100/4.3                         generic        Sky Lake-E CBDMA Registers
/0/100/4.4                         generic        Sky Lake-E CBDMA Registers
/0/100/4.5                         generic        Sky Lake-E CBDMA Registers
/0/100/4.6                         generic        Sky Lake-E CBDMA Registers
/0/100/4.7                         generic        Sky Lake-E CBDMA Registers
/0/100/5                           generic        Sky Lake-E MM/Vt-d Configurati
/0/100/5.2                         generic        Sky Lake-E RAS
/0/100/5.4                         generic        Sky Lake-E IOAPIC
/0/100/8                           generic        Sky Lake-E Ubox Registers
/0/100/8.1                         generic        Sky Lake-E Ubox Registers
/0/100/8.2                         generic        Sky Lake-E Ubox Registers
/0/100/14                          bus            200 Series/Z370 Chipset Family
/0/100/14/0        usb1            bus            xHCI Host Controller
/0/100/14/0/2                      generic        FT232R USB UART
/0/100/14/0/3                      generic        ADI Evaluation System
/0/100/14/0/9                      generic        CP2102 USB to UART Bridge Cont
/0/100/14/0/a                      generic        PM100D
/0/100/14/0/c                      generic        CP2102 USB to UART Bridge Cont
/0/100/14/0/d                      bus            USB2.0 Hub
/0/100/14/0/e                      input          MYSTIC LIGHT
/0/100/14/1        usb2            bus            xHCI Host Controller
/0/100/14/1/1                      generic        Blackfly BFLY-U3-23S6C
/0/100/14.2                        generic        200 Series PCH Thermal Subsyst
/0/100/16                          communication  200 Series PCH CSME HECI #1
/0/100/17                          storage        200 Series PCH SATA controller
/0/100/1b                          bridge         200 Series PCH PCI Express Roo
/0/100/1b.2                        bridge         200 Series PCH PCI Express Roo
/0/100/1b.2/0                      bus            ASM2142 USB 3.1 Host Controlle
/0/100/1b.2/0/0    usb3            bus            xHCI Host Controller
/0/100/1b.2/0/1    usb4            bus            xHCI Host Controller
/0/100/1b.4                        bridge         200 Series PCH PCI Express Roo
/0/100/1b.4/0      /dev/nvme0      storage        Samsung SSD 970 EVO 1TB
/0/100/1b.4/0/0    hwmon0          disk           NVMe disk
/0/100/1b.4/0/2    /dev/ng0n1      disk           NVMe disk
/0/100/1b.4/0/1    /dev/nvme0n1    disk           1TB NVMe disk
/0/100/1b.4/0/1/1  /dev/nvme0n1p1  volume         238GiB Linux swap volume
/0/100/1b.4/0/1/2  /dev/nvme0n1p2  volume         15GiB EXT4 volume
/0/100/1b.4/0/1/3  /dev/nvme0n1p3  volume         677GiB EXT4 volume
/0/100/1c                          bridge         200 Series PCH PCI Express Roo
/0/100/1c.2                        bridge         200 Series PCH PCI Express Roo
/0/100/1c.2/0      enp5s0          network        RTL8125 2.5GbE Controller
/0/100/1c.4                        bridge         200 Series PCH PCI Express Roo
/0/100/1c.4/0                      bus            ASM3242 USB 3.2 Host Controlle
/0/100/1c.4/0/0    usb5            bus            xHCI Host Controller
/0/100/1c.4/0/1    usb6            bus            xHCI Host Controller
/0/100/1d                          bridge         200 Series PCH PCI Express Roo
/0/100/1d/0        /dev/nvme1      storage        Samsung SSD 970 EVO 500GB
/0/100/1d/0/0      hwmon1          disk           NVMe disk
/0/100/1d/0/2      /dev/ng1n1      disk           NVMe disk
/0/100/1d/0/1      /dev/nvme1n1    disk           500GB NVMe disk
/0/100/1d/0/1/1    /dev/nvme1n1p1  volume         449MiB Windows NTFS volume
/0/100/1d/0/1/2    /dev/nvme1n1p2  volume         98MiB Windows FAT volume
/0/100/1d/0/1/3    /dev/nvme1n1p3  volume         15MiB reserved partition
/0/100/1d/0/1/4    /dev/nvme1n1p4  volume         464GiB Windows NTFS volume
/0/100/1d/0/1/5    /dev/nvme1n1p5  volume         520MiB Windows NTFS volume
/0/100/1f                          bridge         X299 Chipset LPC/eSPI Controll
/0/100/1f/0                        system         PnP device PNP0c02
/0/100/1f/1                        system         PnP device PNP0c02
/0/100/1f/2                        system         PnP device PNP0c02
/0/100/1f/3                        system         PnP device PNP0c02
/0/100/1f/4                        system         PnP device PNP0c02
/0/100/1f.2                        memory         Memory controller
/0/100/1f.3        card0           multimedia     200 Series PCH HD Audio
/0/100/1f.3/0      input10         input          HDA Intel PCH Front Mic
/0/100/1f.3/1      input11         input          HDA Intel PCH Rear Mic
/0/100/1f.3/2      input12         input          HDA Intel PCH Line
/0/100/1f.3/3      input13         input          HDA Intel PCH Line Out Front
/0/100/1f.3/4      input14         input          HDA Intel PCH Line Out Surroun
/0/100/1f.3/5      input15         input          HDA Intel PCH Line Out CLFE
/0/100/1f.3/6      input16         input          HDA Intel PCH Front Headphone
/0/100/1f.4                        bus            200 Series/Z370 Chipset Family
/0/100/1f.6        eno1            network        Ethernet Connection (2) I219-V
/0/101                             bridge         Sky Lake-E PCI Express Root Po
/0/101/0                           generic        Chenming Mold Ind. Corp.
/0/1                               generic        Sky Lake-E VT-d
/0/3                               generic        Sky Lake-E RAS Configuration R
/0/4                               generic        Sky Lake-E IOxAPIC Configurati
/0/6                               generic        Sky Lake-E CHA Registers
/0/8.1                             generic        Sky Lake-E CHA Registers
/0/8.2                             generic        Sky Lake-E CHA Registers
/0/8.3                             generic        Sky Lake-E CHA Registers
/0/8.4                             generic        Sky Lake-E CHA Registers
/0/8.5                             generic        Sky Lake-E CHA Registers
/0/8.6                             generic        Sky Lake-E CHA Registers
/0/8.7                             generic        Sky Lake-E CHA Registers
/0/7                               generic        Sky Lake-E CHA Registers
/0/9.1                             generic        Sky Lake-E CHA Registers
/0/9.2                             generic        Sky Lake-E CHA Registers
/0/9.3                             generic        Sky Lake-E CHA Registers
/0/9.4                             generic        Sky Lake-E CHA Registers
/0/9.5                             generic        Sky Lake-E CHA Registers
/0/9.6                             generic        Sky Lake-E CHA Registers
/0/9.7                             generic        Sky Lake-E CHA Registers
/0/8                               generic        Sky Lake-E CHA Registers
/0/9                               generic        Sky Lake-E CHA Registers
/0/e                               generic        Sky Lake-E CHA Registers
/0/e.1                             generic        Sky Lake-E CHA Registers
/0/e.2                             generic        Sky Lake-E CHA Registers
/0/e.3                             generic        Sky Lake-E CHA Registers
/0/e.4                             generic        Sky Lake-E CHA Registers
/0/e.5                             generic        Sky Lake-E CHA Registers
/0/e.6                             generic        Sky Lake-E CHA Registers
/0/e.7                             generic        Sky Lake-E CHA Registers
/0/f                               generic        Sky Lake-E CHA Registers
/0/f.1                             generic        Sky Lake-E CHA Registers
/0/f.2                             generic        Sky Lake-E CHA Registers
/0/f.3                             generic        Sky Lake-E CHA Registers
/0/f.4                             generic        Sky Lake-E CHA Registers
/0/f.5                             generic        Sky Lake-E CHA Registers
/0/f.6                             generic        Sky Lake-E CHA Registers
/0/f.7                             generic        Sky Lake-E CHA Registers
/0/10                              generic        Sky Lake-E CHA Registers
/0/10.1                            generic        Sky Lake-E CHA Registers
/0/1d                              generic        Sky Lake-E CHA Registers
/0/1d.1                            generic        Sky Lake-E CHA Registers
/0/1d.2                            generic        Sky Lake-E CHA Registers
/0/1d.3                            generic        Sky Lake-E CHA Registers
/0/1e                              generic        Sky Lake-E PCU Registers
/0/1e.1                            generic        Sky Lake-E PCU Registers
/0/1e.2                            generic        Sky Lake-E PCU Registers
/0/1e.3                            generic        Sky Lake-E PCU Registers
/0/1e.4                            generic        Sky Lake-E PCU Registers
/0/1e.5                            generic        Sky Lake-E PCU Registers
/0/1e.6                            generic        Sky Lake-E PCU Registers
/0/102                             bridge         Sky Lake-E PCI Express Root Po
/0/102/0                           display        TU102 [GeForce RTX 2080 Ti Rev
/0/102/0.1         card1           multimedia     TU102 High Definition Audio Co
/0/102/0.1/0       input3          input          HDA NVidia HDMI/DP,pcm=3
/0/102/0.1/1       input4          input          HDA NVidia HDMI/DP,pcm=7
/0/102/0.1/2       input5          input          HDA NVidia HDMI/DP,pcm=8
/0/102/0.1/3       input6          input          HDA NVidia HDMI/DP,pcm=9
/0/102/0.1/4       input7          input          HDA NVidia HDMI/DP,pcm=10
/0/102/0.1/5       input8          input          HDA NVidia HDMI/DP,pcm=11
/0/102/0.1/6       input9          input          HDA NVidia HDMI/DP,pcm=12
/0/102/0.2                         bus            TU102 USB 3.1 Host Controller
/0/102/0.2/0       usb7            bus            xHCI Host Controller
/0/102/0.2/1       usb8            bus            xHCI Host Controller
/0/102/0.3                         bus            TU102 USB Type-C UCSI Controll
/0/a                               generic        Sky Lake-E VT-d
/0/11                              generic        Sky Lake-E RAS Configuration R
/0/13                              generic        Sky Lake-E IOxAPIC Configurati
/0/14                              generic        Sky Lake-E Integrated Memory C
/0/18                              generic        Sky Lake-E Integrated Memory C
/0/19                              generic        Sky Lake-E Integrated Memory C
/0/1a                              generic        Sky Lake-E Integrated Memory C
/0/a.2                             generic        Sky Lake-E Integrated Memory C
/0/a.3                             generic        Sky Lake-E Integrated Memory C
/0/a.4                             generic        Sky Lake-E Integrated Memory C
/0/a.5                             generic        Sky Lake-E LM Channel 1
/0/a.6                             generic        Sky Lake-E LMS Channel 1
/0/a.7                             generic        Sky Lake-E LMDP Channel 1
/0/b                               generic        Sky Lake-E DECS Channel 2
/0/b.1                             generic        Sky Lake-E LM Channel 2
/0/b.2                             generic        Sky Lake-E LMS Channel 2
/0/b.3                             generic        Sky Lake-E LMDP Channel 2
/0/c                               generic        Sky Lake-E Integrated Memory C
/0/c.1                             generic        Sky Lake-E Integrated Memory C
/0/c.2                             generic        Sky Lake-E Integrated Memory C
/0/c.3                             generic        Sky Lake-E Integrated Memory C
/0/c.4                             generic        Sky Lake-E Integrated Memory C
/0/c.5                             generic        Sky Lake-E LM Channel 1
/0/c.6                             generic        Sky Lake-E LMS Channel 1
/0/c.7                             generic        Sky Lake-E LMDP Channel 1
/0/d                               generic        Sky Lake-E DECS Channel 2
/0/d.1                             generic        Sky Lake-E LM Channel 2
/0/d.2                             generic        Sky Lake-E LMS Channel 2
/0/d.3                             generic        Sky Lake-E LMDP Channel 2
/0/103                             bridge         Sky Lake-E PCI Express Root Po
/0/103/0           scsi0           storage        MegaRAID SAS-3 3108 [Invader]
/0/103/0/2.0.0     /dev/sda        disk           7999GB MR9361-8i
/0/103/0/2.0.0/1   /dev/sda1       volume         15MiB reserved partition
/0/103/0/2.0.0/2   /dev/sda2       volume         7449GiB Windows NTFS volume
/0/2                               bridge         Sky Lake-E PCI Express Root Po
/0/2/0                             generic        PXIe/PCIe Device
/0/5                               generic        Sky Lake-E VT-d
/0/5.2                             generic        Sky Lake-E RAS Configuration R
/0/5.4                             generic        Sky Lake-E IOxAPIC Configurati
/0/12                              generic        Sky Lake-E M3KTI Registers
/0/12.1                            generic        Sky Lake-E M3KTI Registers
/0/12.2                            generic        Sky Lake-E M3KTI Registers
/0/15                              generic        Sky Lake-E M2PCI Registers
/0/15.1                            generic        Sky Lake-E DDRIO Registers
/0/16                              generic        Sky Lake-E M2PCI Registers
/0/16.1                            generic        Sky Lake-E DDRIO Registers
/0/16.4                            generic        Sky Lake-E M2PCI Registers
/0/16.5                            generic        Sky Lake-E DDRIO Registers
/0/17                              generic        Sky Lake-E M2PCI Registers
/0/17.1                            generic        Sky Lake-E DDRIO Registers
/1                                 power          To Be Filled By O.E.M.
/2                 /dev/fb0        display        EFI VGA
/3                 input0          input          Sleep Button
/4                 input1          input          Power Button
/5                 input17         input          MSI MYSTIC LIGHT

This is saving on an 8-TB RAID disk, I could try saving on one of the other disks if you think that some settings of this disk might be an issue... but the RAID disk might not be the fastest but it does not seem to be particularly slow:

sudo dd if=/dev/sda of=test.file bs=10G count=10 oflag=dsync
dd: warning: partial read (2147479552 bytes); suggest iflag=fullblock
0+10 records in
0+10 records out
21474795520 bytes (21 GB, 20 GiB) copied, 30.5937 s, 702 MB/s

Here are the RAM plots. It seems to be better now (not massively using Swap), but I would expect it to use less than 200GB if the files are ~20GB, right?:

Untitled
alihaydaroglu commented 1 year ago

OK, this seems to be working roughly as intended for now. Yes, 200 GB is too much, I will take a look at it to try to make it smaller. It's so big because the registered movie is larger than the original movie (because of the padding on both sides), and it is saved as a float32 (not int16) after registration, so the movie is probably around 50-60GB, and there are a few copies of it I guess. I should be able to reduce it. I'll see if converting back to int16 is OK.

About the I/O speed: seems like that's your limiting factor now. For me, I have a pair of M.2 SSDs arranged in RAID0 format, so it takes about ~20 second to save each file (I think this explains the timing difference between our runs). If you want to upgrade your workstation, it's not a bad idea to get these as temporary storage since there is a lot of i/o during processing.

oterocoronel commented 1 year ago

I still think there's something weird going on when I am saving them. The disk speed was 700mb/s in my test, and for each 5GB file (100 frames) it still takes >70 seconds to save, so I believe it is saving almost 10x slower than it should

Also, it seems that some calculations are using float64. For example, this appears in the log when sending a batch to the GPU: Mov of shape 30, 10, 663, 628; 0.93 GB . However if you run: np.zeros((30,10,663,628), dtype = np.float32).nbytes / 1024 / 1024 / 1024, you get that this volume should be 0.465 GB, which is half of the value reported. So I would think that the volume is in float64. This might even be happening during initialization, since this is printed then:

  Aligning planes
float64
20

Is the float64 necessary, or maybe just a consequence of how the volume was initialized?

oterocoronel commented 1 year ago

Btw, the final frame size in my previous run was 861x855 pixels (due to a wrong lateral-offset estimation from the shallower planes) and it was taking ~120 secs to save 100 frames. Now that I used more frames to initialize and I got better lateral-offset values, frames are 663x628 and it takes ~70 secs to save 100 frames

alihaydaroglu commented 1 year ago

The print statements are confusing, that is the size of the movie on the GPU while it is still in complex64 format. I am pretty sure it is reduced back to float32 when sent back to cpu, though I should double check. But yes, agreed with your other issue about the lateral offset estimation, when it fails for shallow planes it leads to way too much padding which makes everything slower. I’ll allow an option to update those values by the user.

Your issue with write speed might be something internal to numpy.save, the time reported there is for one line of code that calls np.save on an array in memory. I’m not sure why it doesn’t max out the disk write speed… maybe I should look into a different format for the storage.