SystemsGenetics / KINC

Knowledge Independent Network Construction
MIT License
11 stars 4 forks source link

Small differences between CPU/GPU in similarity analytic #137

Closed spficklin closed 4 years ago

spficklin commented 4 years ago

Using the similarity analytic on a standalone machine with 1 GPU and several CPUs, KINC will load balance the chunks across the processors and GPU when using MPI. Without MPI, all of the work goes to the GPUs. Using the chunkrun all of the work also goes to GPUs. When running the test data in the example folder the resulting network has about 36K edges. However, when MPI us used on a machine described above there are about 100 edges that different between the the MPI run and the serial GPU run.

After discussion with with the team it seems this may be caused by some slight differences in the serial and CUDA implementations of the GMM method.

This is perhaps not a major problem because most of the edges are identical and those that are not are a very small portion (perhaps already within the expected error rate) but it should be looked at.

spficklin commented 4 years ago

I should also note, there are about 1K edges that are different just by slight floating point differences.... not a big deal.

bentsherman commented 4 years ago

After a quick look into this, the error is coming from a difference between the serial and CUDA implementations. If you do a single-CPU and single-GPU run you will find that the results are different. Since the chunk run was consistent I think it's safe to say that MPI is not a factor here.

Also should be noted that I used GMM and Spearman, but I will repeat the same runs using only Pearson and only Spearman. In cases where the sample strings were identical but the similarity scores were slightly different, that would be caused by the Spearman kernel. So I will repeat with just Spearman to confirm, and again with just Pearson to see if there are any issues there as well. For cases where the sample strings are different, the cause could lie with GMM or outlier removal. So we should also test with and without outlier removal.

@spficklin This is what I'm going to do, you're welcome to try yourself if you're curious:

bentsherman commented 4 years ago

I ran pearson and spearman by themselves and the CPU/GPU results are still slightly different, that is the correlations sometimes have differences <= 1e-5. That tells me that there are some low-level differences in the floating-point arithmetic. I'm not sure that I can do anything about it. And if there are differences even with pearson, I imagine that the accumulated floating-point differences in the GMM kernel (lots of multiples and adds) could lead to different cluster sizes and sample strings between CPU and GPU. Same goes for the outlier kernel and which samples are marked as outliers.

That being said, my guess is that the gene pairs that differ are probably borderline cases. Not in the sense of having low correlation, but in the sense that they are probably at the boundary of having one or two clusters. @spficklin Perhaps you could make some scatter plots of the differing pairs that you found to see what they look like?

4ctrl-alt-del commented 4 years ago

That seems right to me Ben. Given how GPU architecture works in general and how sensitive GMM is to small differences in floating point arithmetic, this might boil down to hardware limitations in relation to floating point arithmetic never being deterministic across different processors.

spficklin commented 4 years ago

This all sounds reasonable. Closing this out.