Runinho / pytorch-cutpaste

unoffical and work in progress PyTorch implementation of CutPaste
https://runinho.github.io/pytorch-cutpaste/
229 stars 50 forks source link

RuntimeError: Pin memory thread exited unexpectedly #4

Closed SixGodGG closed 3 years ago

SixGodGG commented 3 years ago

anyone seen this problem? thredError

window10 gtx1650 parser.add_argument('--workers', default=1, type=int, help="number of workers to use for data loading (default:8)")

i have set --workers = 1

Runinho commented 3 years ago

maybe try to disable memory pinning in the data loader By changing line 62 in run_training.py and following from this:

    dataloader = DataLoader(train_data, batch_size=batch_size, drop_last=True,
                            shuffle=True, num_workers=workers, collate_fn=cut_paste_collate_fn,
                            persistent_workers=True, pin_memory=True, prefetch_factor=5)

to this:

    dataloader = DataLoader(train_data, batch_size=batch_size, drop_last=True,
                            shuffle=True, num_workers=workers, collate_fn=cut_paste_collate_fn,
                            persistent_workers=True, pin_memory=False, prefetch_factor=5)

For more information about memory pinning see the PyTorch docs here.Turning it off just results in a performance hit but the code should still work.

Runinho commented 3 years ago

I think this issue is related: https://discuss.pytorch.org/t/persistent-workers-true-for-the-dataloader-causes-errors/106669

I'm using PytTorch version 1.8.1 maybe try to upgrade your PyTorch version

SixGodGG commented 3 years ago

maybe try to disable memory pinning in the data loader By changing line 62 in run_training.py and following from this:

    dataloader = DataLoader(train_data, batch_size=batch_size, drop_last=True,
                            shuffle=True, num_workers=workers, collate_fn=cut_paste_collate_fn,
                            persistent_workers=True, pin_memory=True, prefetch_factor=5)

to this:

    dataloader = DataLoader(train_data, batch_size=batch_size, drop_last=True,
                            shuffle=True, num_workers=workers, collate_fn=cut_paste_collate_fn,
                            persistent_workers=True, pin_memory=False, prefetch_factor=5)

For more information about memory pinning see the PyTorch docs here.Turning it off just results in a performance hit but the code should still work.

i set pin_memory=False and solved this issue

SixGodGG commented 3 years ago

I think this issue is related: https://discuss.pytorch.org/t/persistent-workers-true-for-the-dataloader-causes-errors/106669

I'm using PytTorch version 1.8.1 maybe try to upgrade your PyTorch version

did u tried to run this code at windows10? and if u have tried use GPU?

Runinho commented 3 years ago

Yes it's running on my Machine with windows 10 and a GPU.

SixGodGG commented 3 years ago

Yes it's running on my Machine with windows 10 and a GPU.

nice,and how many work_num u set? i set --workers = 1. how many yours? and where are u come from

Runinho commented 3 years ago

I currently do not have my Windows 10 machine at hand but I think I'm using the default of 8. Windows multithreading is in my experience sometimes not that well supported by deep learning frameworks. You can also disable the loading of the data in another thread by setting workers=0. As described in the PyTorch documentation here. But this results in longer training times.

I'm from Germany. What about you?

SixGodGG commented 3 years ago

I currently do not have my Windows 10 machine at hand but I think I'm using the default of 8. Windows multithreading is in my experience sometimes not that well supported by deep learning frameworks. You can also disable the loading of the data in another thread by setting workers=0. As described in the PyTorch documentation here. But this results in longer training times.

I'm from Germany. What about you?

asia, and thanks for your response, i will try to read code and paper again. now training have succeed.

Runinho commented 3 years ago

now training have succeed.

Nice.

I also added my results to the readme

SixGodGG commented 3 years ago

now training have succeed.

Nice.

I also added my results to the readme

i have seen your result,and i got some same result too, when finish the (implement gradCam) work and (Patch Heatmap) work, its very useful

Runinho commented 3 years ago

when finish the (implement gradCam) work and (Patch Heatmap) work, its very useful

I'm currently not planning to implement gradCam or the patch heatmap.

SixGodGG commented 3 years ago

when finish the (implement gradCam) work and (Patch Heatmap) work, its very useful

I'm currently not planning to implement gradCam or the patch heatmap.

hi, how to understand your hyperparameters --head_layer ,i try to set it 1 or 2 and get different results, some type better some became bad, i read whole paper, and cant find the details about --head_layer, does it come from level-7 feature? head could u help me to explan it? how to amend this hyperparameters , it affect the time ?

SixGodGG commented 3 years ago

when finish the (implement gradCam) work and (Patch Heatmap) work, its very useful

I'm currently not planning to implement gradCam or the patch heatmap.

hi, do u have any other communication tool? i want to talk with u,sir. please, my ins: triple_night_sss have some question to talk,thank you and can be pay pp if necessary