Possible performance issue on CosMx data

yihming commented 3 months ago

Hello,

First, thank you for developing such a great tool!

When trying your package on our CosMx data, I found that its runtime and memory usage performance seem down-perform comparing with your benchmark (Figure 11 in your paper).

In specific, our CosMx slide contains ~ 6 * 10^7 transcripts (I simply count the transcripts.csv.gz returned by running stitch_fov jl script). With this data scale, I ran proseg with the following command (by using 30 threads):

proseg --cosmx-micron --nthreads 30 transcripts.csv.gz

and it took 11 hours 38 minutes to finish, and the peak memory usage was around 20 GB.

However, based on your benchmark shown in Figure 11 of your paper, proseg takes only ~17 minutes to finish for ~10^7 transcripts, and the memory usage is below 9.7 GB.

Given this inconsistency, I just wonder if I missed any step to make the execution achieve better performance.

The machine I used has 32 vCPUs and 249 GB memory, and the OS/software info is below:

OS: Ubuntu 22.04.4 LTS
rustc + cargo: v1.79.0
proseg: v1.0.5
julia: v1.10.4

Thanks!

Sincerely, Yiming

roanvanscheppingen commented 3 months ago

To add onto this.

For my 22 million transcripts run (2.2*10^7) I used 14-18 threads to run for approximately 4 hours and it peaked to roughly 60GB of memory. proseg full_transcripts.csv.gz --cosmx --coordinate-scale .12 --output-maxpost-counts counts.csv.gz --nthreads 14 --voxel-layers 15

Running freshly installed Rust, Proseg and Julia in a conda environment. So versions are up to date

yihming commented 3 months ago

@roanvanscheppingen Yes, you are right. I reran my job and notice that its memory usage peak could be 80GB. In my previous run I didn't rigorously check the memory usage.

dcjones commented 2 months ago

Hi, thanks for using proseg! Performance is an issue that I'm actively working on, so hopefully it will improve, but it is memory hungry for large datasets.

The CosMx performance plot I show in the paper goes out to about 10^(6.8) = 6.3 * 10^6 transcripts. So your data is about an order of magnitude larger, so you should expect it it be 10x longer runtime and memory. That said, over 11 hours does seem a bit high. Projecting from my performance plot I would expect your dataset to take about 3.5 hours.

I'm not quite sure why it's longer than that, but every dataset is different. One thing I would point out is that memory usage is proportional to transcripts, but also to the number of genes and the number of voxel layers. So reducing the number of voxel layers or filtering negative controls out of the input file will improve runtime somewhat, and memory usage by quite a bit. I don't think this data has the z-axis resolution to justify a large number of voxel layers anyway, but that's debatable.

roanvanscheppingen commented 2 months ago

Hi,

Thanks for the continued updates and the new version of the preprint. I just wanted to put this here, as it seems the right place. We ran Nanostring's image based segmentation again on the same dataset, this increased our number of cells but kept the number of transcripts the same.

Using that transcript table for proseg, I noticed a steady increase in memory usage (now up to 80G) and an increased runtime (16 threads > 5 hours). Given this, I would say initial cell number is also of influence, since everything stayed the same.

dcjones / proseg

Possible performance issue on CosMx data #20