bowang-lab / BLEEP

Spatially Resolved Gene Expression Prediction from H&E Histology Images via Bi-modal Contrastive Learning
Apache License 2.0
63 stars 10 forks source link

The reproducibility of this paper #12

Closed Hanminghao closed 1 month ago

Hanminghao commented 1 month ago

Hello author, first of all thank you for your outstanding work. I had a problem with poor test set results while experimenting entirely with your code and data. Specifically, the model is running on a single A6000 48GB GPU. I have set the batch size to 512 and the learning rate to 1e-3. Apart from not using distributed training, my setup is the same as yours. In addition, the best model filtered by the validation set appears at epoch 38, with a val loss of 3.38. However, the model is only showing performance around 0.02 and 0.03 on the metrics of mean correlation highly expressed genes and mean correlation highly variable genes. I really look forward to your receiving my questions. Thank you. image image

Hanminghao commented 1 month ago

After I changed to distributed training, I got results similar to the manuscript. image

NBitBuilder commented 3 weeks ago

Hi, @Hanminghao ,

I encountered the same issue with you about the low correlation values while training on an A6000. Have you been able to identify the cause? Any insights would be appreciated

NBitBuilder commented 3 weeks ago

Why does distributed training have such a significant impact? I also noticed that the contrastive loss differs from methods like MoCo-v3, as it doesn't gather samples from other GPUs for loss computation. Could you also share the batch size you used for each GPU?

Hanminghao commented 3 weeks ago

I'm sorry to hear that, but in my many tests, I can get good results without using distributed training. My specific operation was to set the batch size to 128, and the following results could be obtained when training on a single A6000: HVG 10.27 HEG:18.97. Note that I did not test only on a single slide as set in the original text, but tested on four slides respectively and calculated the average value.

NBitBuilder commented 3 weeks ago

Thank you for your response. I’m now getting reasonable results using a batch size of 128 and gradient accumulation of 4. This setup mimics bsz=512 on distributed training by reducing the loss instead of concatenating logits before calculating the loss.

I’m quite surprised by how sensitive this type of model is to hyperparameters like batch size.

NBitBuilder commented 3 weeks ago
Model Mean Correlation (Cells) Max Correlation Mean HEG Mean HVG Mean Markers
BLEEP (bsz=128, accum=4) 0.8025 0.6810 0.1630 0.1657 0.2280
BLEEP (bsz=128, accum=1) 0.7149 0.6282 0.1096 0.0988 0.1158