The results are unstable, also it's hard to be stable.

Ffffffffire / HINormer

27 stars 5 forks source link

The results are unstable, also it's hard to be stable. #3

Closed zzyzeyuan closed 11 months ago

zzyzeyuan commented 11 months ago

Hi, I found that there is no seed in your original code, and then added seed into the code by this:

def set_seed(seed):
    torch.manual_seed(seed)
    np.random.seed(seed)
    random.seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed) 
        torch.backends.cudnn.deterministic = True
        torch.backends.cudnn.enabled = False
        torch.backends.cudnn.benchmark = False

set_seed(args.seed)

ap.add_argument('--seed', type=int, default=10)

Unfortunately, I still got different results in every repeat when I repeated 5 times in this seed. In my opinion, I should get 5 same results if the seed is set correctly.

So I have no idea how to reproduce the results stably, could you give me some suggetions? Or how did you ensure to get stable results by not setting seed?

zzyzeyuan commented 11 months ago

For example, when I run the code in IMDB dataset, I got:

Micro-f1: tensor([0.6688, 0.6763, 0.6701, 0.6758, 0.6706])
Macro-f1: tensor([0.6295, 0.6477, 0.6413, 0.6485, 0.6335])

but I think the results I expect to get should be( because I only set one seed, and repeat 5 times):

Micro-f1: tensor([0.6688, 0.6688, 0.6688, 0.6688, 0.6688])
Macro-f1: tensor([0.6295, 0.6295, 0.6295, 0.6295, 0.6295])

Ffffffffire commented 11 months ago

Thanks for your attention. I think it's caused by the process of 'train_valid' split used in the 'load_data' function, you should fix the seed before load_data and context sampling to get stable results. I'm sorry we don't fix the seed to generate easily reproducible experimental results in our experiments. You can conduct your experiments by fixing the seed and tuning the hyperparameters. We will consider the stability problem of HINormer in the future.

zzyzeyuan commented 11 months ago

Thanks for your comment. Actually, I have noticed the process of 'train_valid' split in 'load_data' function. So I fixed the seed by doing np.random.seed(myseed) and random.seed(myseed) and got same 'train_valid' splits. But the results are still unstable. Anyway, thanks a lot.

Ffffffffire commented 11 months ago

I tried to conduct the fix-seed experiments by set_seed before load_data and set_seed before each training, and get the same results like this:

zzyzeyuan commented 11 months ago

Thanks!!! After fixing the seed, I can reproduce same results.

But if you print micro_f1 and macro_f1, you will find the result of each repeat is different with each other, which still confuses me. hahahahahaha

Anyway, it does works.