Closed nvogtvincent closed 4 years ago
Thanks for reporting @Plagioclase, this indeed is very strange behaviour!
A few questions:
out-XXXXXXX
directories? And that indeed both directories exist during the processes, and that files get written to them?print(output_file.tempwritedir_base)
And indeed check that these are different? (and correspond to the directories under 2)
MPI
somehow confuses the particlefile.export()
. Could you, in particlefile.py
change
- try:
- from mpi4py import MPI
- except:
- MPI = None
+ MPI = None
That forces the function not to use MPI
Very curious to hear about your findings!
Thanks Erik, I think I've worked out what the issue is! The processes were all writing to the same out-XXXXXXX directory and the reason is because I was using the same random seed in all my runs to ensure that the set-up is consistent between them. But it looks like the process that generates the temporary directory name is also using that seed, so all of my processes were generating the same 'random' directory name. Defining tempwritedir when setting up the ParticleFile solves the issue.
Ah thanks! Still I think that this would then classify as a bug. We can expect users to want to seed their random number generator, but the output-directory naming should not rely on that. I'm reopening so that we can come up with a fix
Hi @Plagioclase; FYI I have now implemented a change in #931 where an Error is thrown if a ParticleFile wants to use a temporary output directory that already exists. This would have caught your error, by at the start already throwing an error when both processes wanted to write to the same directory. Do you agree this is a good solution?
This sounds very sensible to me - I don't think there's any need for anything more since it's a niche issue in the first place and it's easily corrected once the user is aware of the issue. Thanks for fixing it!
I've come across some strange behaviour when running multiple oceanparcels processes at the same time (not using parcel's parallel capability, I mean running multiple parcels scripts on one machine at the same time). One of two things happens - the first is that there's no error, but the netcdf file generated at the end contains extra information. For instance, when running two processes at once, instead of the time axis being
0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100
it might be
0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 90, 100
The amount of 'extra data' seems to increase with the number of processes I'm running simultaneously (e.g. when running 15 at once I've had files that are 3x larger than they should be), so it looks like the process that generates the netcdf file is getting mixed up between the different processes.
The second option is that the process crashes on the
output_file.export()
line, with the error messageNo such file or directory: 'out-XXXXXXX/0/0.npy
. I've mainly seen this version when running the different instances as foreground processes.I've tested this on two machines, and with the processes being run in the foreground or as background nohup processes, with the same issues in all cases. This issue occurs even when the different scripts are independent, i.e. using different files for the velocity field (and obviously writing to different netcdf files). All of these scripts work with no issues if only one process is running. It looks like the final step that converts the numpy files into netcdf is getting mixed up between output from different processes if more than one instance of parcels is running. Is there any way of getting around this (apart from running each file in its own directory)?