computational-seismology / pypaw

PYthon Process Asdf Wokflow, in short for, pypaw
https://computational-seismology.github.io/pypaw/
GNU Lesser General Public License v3.0
3 stars 6 forks source link

Parallel makedirs cause error #16

Closed rdno closed 7 years ago

rdno commented 7 years ago

Hi,

At sum_adjoint.py#L76, there is a check for existence of a folder. However, sometimes more than one parallel processes cannot find the folder and try to create it. This leads slower ones to fail because the other one already created the folder. It also applies to the remove call below.

It seems that pyasdf solves this issue by checking for existence only in rank=0 then broadcasting the result. asdf_data_set.py#L1717-L1725

wjlei1990 commented 7 years ago

Hi Ridvan,

By Sometimes more than one parallel processes cannot find the folder and try to create it, do you run the sum in parallel, for example, sum to one adjoint file using parallel? However, the sum_adjoint.py are supposed to run only one on process, which means it should run in serial.

If you mean you run lot of sum jobs at the same time, you can create the directory first before launch many jobs.

Does it answer your question?

rdno commented 7 years ago

However, the sum_adjoint.py are supposed to run only one on process, which means it should run in serial.

OK. I did not noticed that it was supposed to be serial. I was running with mpirun. Is there a reason for not being able to run parallel?

wjlei1990 commented 7 years ago

the adjoint source is stored on ds.auxililary_data.AdjointSource. In pyasdf, there is no method supporting the parallel processing the such thing.

Also, I think because summing of the adjoint source is not so time-consuming, so it is not that urgent to add the parallel capability.

rdno commented 7 years ago

Ok. Thank you for answering.