MIC-DKFZ / batchgenerators

A framework for data augmentation for 2D and 3D image classification and segmentation
Apache License 2.0
1.09k stars 221 forks source link

BatchGenerators on Windows error #6

Closed peterneher closed 5 years ago

peterneher commented 6 years ago

Can't pickle local object 'MultiThreadedAugmenter._start.<locals>.producer' Exception ignored in: <bound method MultiThreadedAugmenter.__del__ of <batchgenerators.dataloading.multi_threaded_augmenter.MultiThreadedAugmenter object at 0x000001F3980FDA90>> Traceback (most recent call last): File "C:\ProgramData\Anaconda3\lib\site-packages\batchgenerators\dataloading\multi_threaded_augmenter.py", line 144, in __del__ self._finish() File "C:\ProgramData\Anaconda3\lib\site-packages\batchgenerators\dataloading\multi_threaded_augmenter.py", line 130, in _finish thread.terminate() File "C:\ProgramData\Anaconda3\Lib\multiprocessing\process.py", line 116, in terminate self._popen.terminate() AttributeError: 'NoneType' object has no attribute 'terminate' 485.19 core.mod.core.ioUtil ERROR: File '-c' does not exist 485.21 WARNING: Failed to load command line argument: -c 485.21 core.mod.core.ioUtil ERROR: File 'from multiprocessing.spawn import spawn_main; spawn_main(parent_pid=8892, pipe_handle=7904)' does not exist 485.22 WARNING: Failed to load command line argument: from multiprocessing.spawn import spawn_main; spawn_main(parent_pid=8892, pipe_handle=7904) 485.23 core.mod.core.ioUtil ERROR: File '--multiprocessing-fork' does not exist 485.24 WARNING: Failed to load command line argument: --multiprocessing-fork

FabianIsensee commented 6 years ago

This is a known problem with python multiprocessing and Windows and is not related to batchgenerators. batchgenerators does not support windows for now exactly for that reason. I should add that to the docs. Best, Fabian

peterneher commented 6 years ago

Could multiprocessing simply be optional if this is the only windows issue?

Fabian Isensee notifications@github.com schrieb am Mi., 26. Sep. 2018, 11:58:

This is a known problem with python multiprocessing and Windows and is not related to batchgenerators. batchgenerators does not support windows for now exactly for that reason. I should add that to the docs. Best, Fabian

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/MIC-DKFZ/batchgenerators/issues/6#issuecomment-424657544, or mute the thread https://github.com/notifications/unsubscribe-auth/AQKu8IGb0iPxm1mzX8R0pszc4VrV82jbks5ue0_AgaJpZM4WtjBw .

FabianIsensee commented 6 years ago

Please provide a minimalistic example that crashes on Windows and I will see what I can do

FabianIsensee commented 5 years ago

Looks like we need to change the process start method for Windows

FabianIsensee commented 5 years ago

Windows is more tricks than anticipated. Why do people even use this for deep learning? (JK). I don't have a windows computer to really work on this and I don't have any windows experience either. Is there anyone willing to help?

justusschock commented 5 years ago

@FabianIsensee If this is still an issue, I'd like to help here. I have a windows machine, but I am not that familiar with multiprocessing in windows. However, this should be something we can figure out.

EDIT: I just executed the tests and they all were successful, except the ones regarding the order of the returned values (which can be found here and here). I think this is caused by the starting method of new processes. According to the multiprocessing docs, the start method is different in windows and unix and the unix one is not available on windows.

If this really causes the error, the tests should fail with setting the start method to "spawn" on unix based systems too. I'll try this later, once I'm on my linux machine.

justusschock commented 5 years ago

I guess we wont be able to reproduce the behavior of unix' fork easily. But I'm currently asking myself, if he have to.

The order may not be preserved correctly, but at least it remains reproducible. I think, this might be enough for experimental windows support? I'll have a look in how this behaves in pytorch, because I haven't noticed anything special in this direction in their data loading code base.

FabianIsensee commented 5 years ago

Hi @justusschock, first and foremost I apologize for not responding quicker. As you may know, the MICCAI deadline was this morning and I had to postpone everything until after the deadline. I am very grateful that you are willing to work on windows support. This has been a long standing issue that should have been fixed a long time ago.

Ideally we would like to guarantee that running the same pipeline twice yields the same order of batches with the exact same augmentations being applied to them. So far this was realized in the MultithreadedAugmenter by using num_prosesses queues each with only one multiprocessing.Process being assigned to it. Batches are then returned by looping over these queues. That way workers that were faster than others could not "skip ahead" - something that absolutely has to be avoided (if you want to apply SpatialTransform to just some percentage of your samples then the processing times for batches is very inhomogeneous)! How did you implement that on windows? It could just be that the seeding of the workers is off?

Despite windows support being highly desired, we cannot make any compromises in unix: whatever needs to be done to get batchgenerators running on windows should not compromise unix operation. Specifically, the order of returned batches (and augmentations being applied to them) needs to be reproducible, and the performance needs to be the same as it is now. I'd be happy to test your solution with respect to these aspects :-)

I recommend you use the development_fabian branch for testing your implementation because this one includes as new DataLoader class that I have been working on. It handles the returned indices more explicitly and may even solve the problem you have been reporting out of the box. Have a look here and here

Admittedly, the documentation of MultithreadedAugmenter is not very detailed. It took me a lot of time to get it working properly and in this process, clarits may have been lost. If you need any information regarding how things are handled internally, please let me know and I will do my best to give you whatever you need.

Once again, big thanks!

justusschock commented 5 years ago

Hi Fabian,

No worries, I know that about the MICCAI deadline and I haven't even expected to get an answer today -- everything is fine.

First of all: I totally agree with your restrictions regarding performance. In my previous answer, the change of the start_method was only for debugging purposes, since this should be the main cause for the differences between the different operating systems.

Does the result have to be the same for windows and unix? Or is it enough for the result to be reproducible (UNIX and Windows return a different order, but the order is always the same on each OS)?

To report the current status: The majority of the tests is passing. The only failures with a fresh setup on your branch are:

For debugging I also set added the following lines just before unittest.main() since they are strongly recommended for using multiprocessing in windows:

    from multiprocessing import freeze_support
    freeze_support()

While this does not prevent any of these errors, it reveals an additional one, which may have simply not occured by chance:

While I know the intention behind the test_order and test_restart_and_order tests, I'd be really thankful for information regarding the intention of test_return_all_indices_multi_threaded_shuffle_False and test_return_all_indices_multi_threaded_shuffle_True. Have there ever been errors with that or do you have an idea why they could be failing here?

Best, Justus

FabianIsensee commented 5 years ago

Hi Justus, test_return_all_indices_multi_threaded_shuffle_False and test_return_all_indices_multi_threaded_shuffle_True check if DataLoader actually iterates exactly once over some dataset. This is required if you are for example working with cifar or Imagenet and want to guarantee that one epoch is exactly one iteration over all training samples. Since test_return_all_indices_single_threaded_shuffle_True and test_return_all_indices_single_threaded_shuffle_False do not fail, the error must lie in multithreading. When working with num_processes workers the DataLoader needs to know its process Id [0, num_processes[ to be able to tell what indices it is supposed to return. This information is stored in DataLoader.thread_id (which is set in the producer loop here). You could try and print that to see if the numbers are set correctly. DataLoader has a variable called DataLoader.indices which is a list of all valid indices it can return (these are the indices of the dataset). For each batch, DataLoader will use its get_indices() member function to determine what indices it needs to return. Basically, if you have three workers and a batch size of 4, worker 0 will return indices 0-3 for its first batch, then jump ahead by num_processes * batch_size so that the next batch of this worker will have samples 12-15. Samples 4-11 are in the meantime being handled by worker 1 and 2. So yeah, if these tests are failing the first thing I would check is whether DataLoader.thread_id is set properly. If the shuffle_True test fails you should also check if the seed_for_shuffle is set correctly. DataLoader.indices must always be the same for all workers, so if you print it and it is different than that can cause errors as well.

Does the result have to be the same for windows and unix? Or is it enough for the result to be reproducible (UNIX and Windows return a different order, but the order is always the same on each OS)?

Ideally this should be the same. I don't know anything about python on windows so I don't know if that is even possible (are the random number generators the same?). If the rngs are identical then this should not be an issue actually ebcause everything is deterministic once the issue from above is sorted out.

test_restart_and_order really should work as well. I just had another look at the code and it looks solid to me. If it is failing then maybe again DummyDL is not getting the proper threadid?

I hope this helps and once again, thanks! Best, Fabian

justusschock commented 5 years ago

Okay, I tested this (only for test_multithreaded_augmenter.py right now) and here are the results:

I ran the tests multiple times and for the tests giving the wrong orders, the orders are not reproducible.

I think this indicates, that the workers are created in the wrong order?!

To be honest: I currently have no idea, why this could happen, since the code for initialization of the MultiThreadedAugmenter and thus for the creation of the workers is always the same.

Am I missing something here?

EDIT: I tried to add a small delay (sleep(0.2)) after the creation of each process and this seems to work for this issue (the order is now correct), but this makes the whole tests last approx. 24.5 instead of approx 22.5 seconds (timings averaged over multiple runs).

Nevertheless, there are other errors occuring afterwards. But I guess, we should tackle them one by one.

FabianIsensee commented 5 years ago

Processes are created here: here. In this loop, i is definitely in the right order, so if i is propagated into the processes correctly then there is no issue with thread_it. It shouldn't matter in what order the ids are printed. What is important is that each worker has a unique id. Next step would be to look at what indices are actually returned and why they are different from what we expect. This is something I cannot do without a windows computer and I am afraid I cannot diagnose this from here. I think the easiest way to approach this would be to look at test_order because it uses the super simple DummyDL class and you should quickly be able to see what is supposed to happen and why it is not happening. Best, Fabian

justusschock commented 5 years ago

I really tried this. The thing is: When normally executing this function, the error occurs. But if I try to debug it slowly step by step to find the error, everything is just fine.

justusschock commented 5 years ago

So I think I was able to narrow down the problem (trying the same thing multiple times because the error does not occur when debugging but only with print statements):

While I first thought, that the data might simply be feed to the queues in the wrong order, this is definitely not the case.

The data is feed in the correct order and each worker itself is working correctly by itself. The problem lies somewhere within the merging logic, which seems to be differently implemented under the hood in unix and windows.

So the returned values per worker are in the correct order, but the order of the worker, whose turn it is fur the current value is mixed.

Is this part only implemented by using queues as buffers and iterating over the queues with _next_queue? This seems to be fine for me.

FabianIsensee commented 5 years ago

I implemented the merging logic myself, so if all the workers are working properly and are writing to the correct queues then there is no way this can get messed up under the hood (unless I have a bug but that is unlikely because it works on linux). The merging logic works as follows: Let's say there are 3 workers. That means MultithreadedAugmenter (MTA) has three queues, one for each worker. MTA remebers what queue it is currently at. When a new batch is requested, MTA will call _next_queue to check what queue is should use and then query the next item from that queue. The only way this can get messed up is (and maybe this could be the case?) if the queue is not a fifo but something else. Could you check for that? Maybe queues in windows are not fifo by default.

justusschock commented 5 years ago

I checked the definition of a Queue and they are FIFO for all OS. Well from my observation, every worker by itself is working correctly, meaning it gets the correct indices and puts them to the correct queue (double-checked this).

Can you run the tests on unix with multiprocessing.set_start_method("spawn") directly before the unittest.main() call? This should replicate the behavior of windows regarding the start of processes. If they still pass, we could be sure, that this is not the error's cause. If they fail it very likely is.

justusschock commented 5 years ago

A little update: I switched from multiprocessing.Queue to multiprocessing.Manager.Queue since this has an builtin proxy and is the recommended way for sharing data between processes. This does fix the problem regarding the orders.

The downside is, that I get Broken Pipe Errors (I believe this happens because this Proxy does not forward close and join_thread of the internal queue, but I am not 100% sure for that right now). Additionally the errors in test_return_all_indices_multi_threaded_shuffle_False and test_return_all_indices_multi_threaded_shuffle_True persist. Will investigate further and also try to find a method to beautify the cleanup in MultiThreadedAugmenter._finish

justusschock commented 5 years ago

Sorry for bothering you that much, but I have another question: Why don't you want put and get to be blocking? This shouldnt be a problem, since they are all having their own queue. I'm asking just because all tests are passing if I remove the timeout from all calls of get

FabianIsensee commented 5 years ago

Can you run the tests on unix with multiprocessing.set_start_method("spawn") directly before the unittest.main() call? This should replicate the behavior of windows regarding the start of processes. If they still pass, we could be sure, that this is not the error's cause. If they fail it very likely is.

I will do this tomorrow, thanks for the hint!

Depending on when the broken pipe errors happen they can be bad or not. They are not bad when they happen when MTA._finish() is called because that will terminate the workers and delete the queues. If one of the workers is currently trying to write to a queue that is being deleted then it may rise such an error. Maybe this could be resolved by changing the way _finish() works. If these errors appear under any other circumstances then they are bad.

test_return_all_indices_multi_threaded_shuffle_False is a test that checks if all indices are returned when iterating over the dataset. This is essential to ensure that your model has seen all data once per epoch.

put and get need to be non-blocking because the workers need to regularly check if abort_event was set. They can't do that when they are stuck waiting for space in the queue. MTA needs to check this regularly as well so that it can notice if workers crashed, therefore get is also non-blocking (I assume that by non-blocking you mean the timeout?)

You are not bothering me at all, in contrary. I am happy that you are willing to help out!

justusschock commented 5 years ago

Thanks!

I assume that by non-blocking you mean the timeout? Indeed!

So for simplicity and shortness, I'll continue to use the term non-blocking for the variant of blocking with timeout...

Let's assume only the put is non-blocking and the get would be blocking (which is only done in the main process and the pin-memory-thread IIRC). In this case we should still be able to check for corrupted workers in the other processes (since put is still non-blocking) and the main-thread would check it the next time, after we've processed a batch by our model (or whatever).

The only problem would be the pin-memory thread, but this should work, if we introduce an additional queue if pin_memory==True and we always put the values obtained by the workers in this queue instead of returning immediately and if the memory should be pinned, the thread fetches it from the queue (this could be done non-blocking, since we already have the data in correct order in this queue).

I don't know if you're able to follow my idea (kinda hard to express it in words).

If you got the idea: Might this work (and would this have an impact on performance)? If not: I should manage to deliver some prototype/pseudo-code later today or tomorrow and hopefully you'll get the idea afterwards :)

FabianIsensee commented 5 years ago

I just pushed a bugfix to the development_fabian branch. This could actually have caused some of the problems you were having. Let me know how this worked

FabianIsensee commented 5 years ago

Sorry I did not see your post before I posted mine. I really don't think my non-blocking code should be the problem. However the bug I just fixed is related and actually has to do with the non-blocking stuff I do, so please let me know if that fixed the problems!

justusschock commented 5 years ago

I'll try this out tomorrow!

justusschock commented 5 years ago

This seems to be the bug. And it also really makes sense, but somehow I've always overlooked it. How did you figure this out? :)

All tests are completely passing now. Thanks!

Eventually you might want to add freeze_support to the tests anyways, since they might get frozen.

FabianIsensee commented 5 years ago

I was tracking a different problem I had with one of my experiments. I had to look into MTA for something and that's when I noticed :-)

It seems like you tried out a few things to get Windows working properly. If you do a pull request, I suggest you start with a fresh copy of the development_fabian branch and add only those changes that are required to get windows support. I would like to avoid making any changes that are not absolutely necessary. (for example I don't know what multiprocessing.Manager.Queue really does dofferently from what I was using now and unless it is really needed I would like to keep things as they are. A lot depends on batchgenerators to work like expected ;-) )

Once again big thanks!!!

justusschock commented 5 years ago

I'll do this, but these are only minor changes regarding the tests (adding freeze_support). But I'll do this (probably within the next hour)

One Question though: Should I add a windows pipeline to travis?

FabianIsensee commented 5 years ago

windows is now supported in the development_fabian branch and will soon be merged into the master

zabboud commented 2 years ago

I know this is supposed to be a closed issue as the windows branch was merged to the master - but there seems to still be an issue associated with the multi-threading on windows where just running the simple example with a single image (just like the example provided in the repo with the camera image) - there's a broken pipe error

---> 60 ForkingPickler(file, protocol).dump(obj) 61 62 #BrokenPipeError: [Errno 32] Broken pipe

when I run the same script on linux - I get a different error which seems to be memory related perhaps? RuntimeError: MultiThreadedAugmenter.abort_event was set, something went wrong. Maybe one of your workers crashed. This is not the actual error message! Look further up your stdout to see what caused the error. Please also check whether your RAM was full

But this might indicate that the windows patch might not be fully operational? Any help would be appreciated

Is there a way by any chance to avoid the multithreading option completely?