The training stops without triggering errors

ipietri commented 2 years ago

Hi,

I'm trying to run the code with the same dataset you published but the training stops without triggering any errors. I'm wondering if you have faced this problem before.

See image below, Thanks

kbwzy commented 2 years ago

the same issue. tensorflow-gpu 2.6.0 torch 1.9.0

hahah181292 commented 2 years ago

I wonder tha because the label dataset is too small, so the program can run very quickly without errors

Liu-Jingyao commented 2 years ago

The same issue on Colab. This is probably not because it was running too fast, but the fine tuning was interrupted for some unknown reason

Liu-Jingyao commented 2 years ago

This is because the author set that the training data should not be repeated used when non-uda mode, and I am asking the reason for doing so. https://github.com/SanghunYun/UDA_pytorch/issues/18#issue-1067255803

hahah181292 commented 2 years ago

This is because the author set that the training data should not be repeated used when non-uda mode, and I am asking the reason for doing so. #18 (comment) Hello, have you know the reason for doing so ?

Liu-Jingyao commented 2 years ago

This is because the author set that the training data should not be repeated used when non-uda mode, and I am asking the reason for doing so. #18 (comment) Hello, have you know the reason for doing so ?

Hello. I've done a lot of research but I still don't know what this convoluted setting means. However, I observed that the model works well and played its due performance when I just made the training records repeatable in non-uda mode. It's hard to make sense in theory or practice. Maybe we should just ignore this and make our own version.

hahah181292 commented 2 years ago

This is because the author set that the training data should not be repeated used when non-uda mode, and I am asking the reason for doing so. #18 (comment) Hello, have you know the reason for doing so ?

Hello. I've done a lot of research but I still don't know what this convoluted setting means. However, I observed that the model works well and played its due performance when I just made the training records repeatable in non-uda mode. It's hard to make sense in theory or practice. Maybe we should just ignore this and make our own version.

Thank you for your reply! Do you mean to set sup_iter to repeat_dataloader? But I found that train_eval mode could not be carried out under non-uda, because under the non-uda mode don't have eval_iter. How did you determine which step achieved the best effect? I am looking forward to your generous comments.

Liu-Jingyao commented 2 years ago

This is because the author set that the training data should not be repeated used when non-uda mode, and I am asking the reason for doing so. #18 (comment) Hello, have you know the reason for doing so ?

Hello. I've done a lot of research but I still don't know what this convoluted setting means. However, I observed that the model works well and played its due performance when I just made the training records repeatable in non-uda mode. It's hard to make sense in theory or practice. Maybe we should just ignore this and make our own version.

Thank you for your reply! Do you mean to set sup_iter to repeat_dataloader? But I found that train_eval mode could not be carried out under non-uda, because under the non-uda mode don't have eval_iter. How did you determine which step achieved the best effect? I am looking forward to your generous comments.

A late reply. You can add the eval_iter by refactoring the data structure used to store iterators, just like what you do in supervised learning. For example, I changed the confusing data_iter array in main.py and train.py to a readable dict, just like:

# main.py
# ... ...
data = load_data(cfg)
data_iter = dict()
if cfg.uda_mode:
    # ... ...
    if cfg.mode == 'train':
        data_iter['sup_iter'] = data.sup_data_iter()
        data_iter['unsup_iter'] = data.unsup_data_iter()
    elif cfg.mode == 'train_eval':
        data_iter['sup_iter'] = data.sup_data_iter()
        data_iter['unsup_iter'] = data.unsup_data_iter()
        data_iter['eval_iter'] = data.eval_data_iter()
    else:
        data_iter['eval_iter'] = data.eval_data_iter()
else:
    if cfg.mode == 'train':
        data_iter['sup_iter'] = data.sup_data_iter()
    elif cfg.mode == 'train_eval':
        data_iter['sup_iter'] = data.sup_data_iter()
        data_iter['eval_iter'] = data.eval_data_iter()
    else:
        data_iter['eval_iter'] = data.eval_data_iter()

# train.py
# ... ...
if 'sup_iter' in data_iter.keys():
    self.sup_iter = self.repeat_dataloader(data_iter['sup_iter'])
if 'unsup_iter' in data_iter.keys():
    self.unsup_iter = self.repeat_dataloader(data_iter['unsup_iter'])
if 'eval_iter' in data_iter.keys():
    self.eval_iter = data_iter['eval_iter']

You may have found your own solution, but I still wish my practice will inspire you a bit. Feel free to ask me more.

LISCARqaq commented 2 years ago

Hi,

I'm trying to run the code with the same dataset you published but the training stops without triggering any errors. I'm wondering if you have faced this problem before.

See image below, Thanks

Hello, I have the same issue too. Have you fixed the problem and how to do?

Liu-Jingyao commented 2 years ago

Hi, I'm trying to run the code with the same dataset you published but the training stops without triggering any errors. I'm wondering if you have faced this problem before. See image below, Thanks

Hello, I have the same issue too. Have you fixed the problem and how to do?

Hello. Yes, you can find my solution in my previous comments.

SanghunYun / UDA_pytorch

The training stops without triggering errors #15