ORAC-CC / orac

Optimal Retrieval of Aerosol and Cloud
GNU General Public License v3.0
28 stars 19 forks source link

orac stuck after pre-processing #94

Closed pdebuyl closed 4 months ago

pdebuyl commented 4 months ago

Hello ORAC team,

I recompiled ORAC again, this time on Ubuntu 22.04.

Minor annoyances: for some libraries I needed to enable Fortran flags such as "please disregard the type of the arguments everything should be fine" (fu-liou and orac itself if I remember well), libemos data was not in the package, and there is no package for NetCDF with Fortran.

Anyway, I can now run orac. The command is:

python orac.py /data/pdebuyl/ORAC_DATA/MSG4_202001011300/H-000-MSG4__-MSG4________-_________-EPI______-202001011300-__ --no_snow_corr --use_camel_emis

The pre-processing is done and orac seems stuck (2 hours already for a single satellite image) afterward. There is a "cld" driver file.

At what point does processing time become problematic? The machine has more than 200GB of RAM, the CPU is a Xeon at 2.8GHz. orac seems to run single-threaded though (there are many cores on the machine).

simonrp84 commented 4 months ago

Single thread processing is prohibitively slow for full disk images. I put a lot of effort into it but gave up in the end as not worthwhile!

Did you compile ORAC with the -fopenmp flag on GCC? That should enable multithreaded operation and speed things up a lot...although you may also need some flag to the python code, I'm not sure.

pdebuyl commented 4 months ago

I threw more cores at it. I had to alias (symbolic links) the SAD files from MSG3 (or MSG2) for MSG4 as the code complained after a already long processing time about them :-/

But I finally got output files! Thanks for the help.

adamcpovey commented 4 months ago

I threw more cores at it. I had to alias (symbolic links) the SAD files from MSG3 (or MSG2) for MSG4 as the code complained after a already long processing time about them :-/

But I finally got output files! Thanks for the help.

That's interesting, as a complaint about MSG4 should only come from the SAD file read routine, which is one of the first things the code does. Do you have the text output from the run? There may have been network trouble.

pdebuyl commented 4 months ago

(orac) pdebuyl@server:~/orac/tools$ python orac.py /data/pdebuyl/ORAC_DATA/MSG4_202001011300/H-000-MSG4-MSG4__-_____-EPI__-202001011300-__ --no_snow_corr --use_camel_emis
Beginning orac_preproc
orac_preproc is complete. Output written to:
/data/pdebuyl/ORAC_DATA/MSG4_202001011300/pre
ERROR: Read_LUT(): Error opening file: /data/pdebuyl/ORAC_DATA/SAD/seviri_WAT/SEVIRI-MSG4_WAT_Bext_Ch4.sad
Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP 1040

@adamcpovey the compute time I mention was "only" the preprocessing time apparently. Still a bit too much to wait in front of it :-)

adamcpovey commented 4 months ago

Ah Yes, the preprocessor can be quite slow for a full disc of geostationary data as the Cox-Munk calculation is quite slow. If you only care about thick cloud, that can be mitigated. (The default run provides everything that both aerosol and cloud need. If you know what you want, it is possible to skip steps. Simon does this a lot.)

Once the preprocessor finishes, you don't need to run it again. The argument --clobber 0 means that the Python script will not reprocess things that have finished and skip forward in the cycle.

simonrp84 commented 4 months ago

@adamcpovey Do your python scripts use multithreading by default? If not, what's the option to enable this? Am sure that would speed up Pierre's processing a lot.

adamcpovey commented 4 months ago

Yes, --procs N sets the number of threads spawned. The default is 1. I usually run 7 as I have eight cores on my machine.

simonrp84 commented 4 months ago

OK. @pdebuyl try with Adam's option above and you should get a nice speed increase (assuming you compiled everything with openmp support). In my experience, the slowest bits of the pre-processor are Cox-Munk (especially with BRDF enabled) and RTTOV. For the main processor, the speed depends on how many cloudy pixels there are. This is for the contrails project, right? If so, maybe you can speed things up by disabling liquid water (WAT) retrievals as those won't be much good for contrails. :)

pdebuyl commented 4 months ago

Hi all, thanks for the advice :-)

I tried first with setting OMP_NUM_THREADS in my terminal. After hunting for thread occurences in the code I found what the command-line help (and you above) would have given me. My first result was obtained with --procs 8.

Also, as the discussion continues:

  1. Yes it is (right now) for contrails. So indeed, I can start being selecting (the area of interest is around Europe, I don't need water clouds). I am happy to have an output file at all so far though :-p
  2. The cloud properties in my first output is incomplete. I have a nice cloud mask, cloud types, but the cloud optical thickness is 32767 everywhere. There is no error from the log, but I still guess that this is bad. I had to disable type-matching in the compilation of ORAC so maybe there is something there 8-|

Thanks again for all the advice. Is there a mailing that would be preferred to github issues (I saw that devorac was closed or pending closure at JISCMAIL).

adamcpovey commented 4 months ago

The ORAC Slack is probably the best place for debugging and practical questions. Contact me on adam DOT povey AT le DOT ac DOT uk and I'll send you an invite.

simonrp84 commented 4 months ago

I just sent you an invite link via email, Pierre.