Navidfoumani / ConvTran

This is a PyTorch implementation of ConvTran
MIT License
123 stars 8 forks source link

About reproducibility #6

Closed firrice closed 4 months ago

firrice commented 5 months ago

Hi, the tAPE and ePRE position embedding methods proposed in ConvTran are great and novel, and i have doing several experiments for them, while i met a problem during reproduction. The detailed information are as follows: Hardware env: i7-12700K, RTX 3060 12G Software env: python3.8 pytorch2.0.1+cuda11.7 and other libs satisfy the requirements.txt

I use the code with no modification to procude the results on UEA datasets, and the comparison with which beyond to the paper can be seen as below: QQ截图20240412150405 (1) According to the fig above, there are gaps on some datasets which are marked red("OOM" represents out of cuda memory, which doesn't need to be considered here); (2) i discovered that embedding size was set to 64 in paper rather than 16 in the code: image (3) reference to the issue proposed earlier(https://github.com/Navidfoumani/ConvTran/issues/5), if the default set in code was used, about 190G cuda memory will be need, which have gone beyond the capacity of a single A5000(24GB) using in paper: image

So, i wander if the reproduction problem is due to there is a separate experimental hyperparameter setting for each dataset within UEA datasets rather than sharing the default setting across all datasets in practice? or reasons else? Look forward to your reply!

Navidfoumani commented 4 months ago

Hi, Thanks for informing me. For reproducibility, there are a few points to note: 1- It's essential to run each dataset separately. I've recently noticed that running them consecutively (all in once) in Pytorch affects the results due to initialization issues. I achieved better results by running each dataset individually. 2- The datasets mentioned are generally small, leading to considerable result variance. To address this, you can average the results from 5 experiments.

Regarding memory concerns, these arose with only a few datasets. To manage them, I employed memory-saving methods like using memmap or reducing the batch size.