cedadev / swallow

Swallow - a Birdhouse WPS for running the NAME Trajectory code.
Other
0 stars 1 forks source link

trajectory run not always producing plots #44

Closed alaniwi closed 1 year ago

alaniwi commented 2 years ago

Test case with trajectory run at Lat=34, lon=45, heights=55,70 produced a plot

but with Lat=40, lon=50, heights=200,300 did not (it did not produce a full time series of output, only the first time, although exited with apparent success)

both runs were forward trajectories, initialised on 2022-01-01 00:00:00 - same symptoms if 12 hour or 48 hour run

Investigate why there is a difference

alaniwi commented 2 years ago
alaniwi commented 2 years ago

Matrix of combinations (tested via command line):

So the difference is in the horizontal position rather than the height coordinates of the release.

although subject to an issue re segfaults that I will comment on below

alaniwi commented 2 years ago

I note here that the first time I ran test2 above, it gave this segmentation fault:

[cwps@ceda-wps-staging no-plots]$ ./run_name.sh inp_test2
[ceda-wps-staging.ceda.ac.uk:16838] OPAL ERROR: Error in file pmix2x.c at line 326
[ceda-wps-staging.ceda.ac.uk:16838] OPAL ERROR: Error in file pmix2x.c at line 326
[ceda-wps-staging.ceda.ac.uk:16838] OPAL ERROR: Error in file pmix2x.c at line 326
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
nameiii_64bit_par  000000000086C2A1  tbk_trace_stack_i     Unknown  Unknown
nameiii_64bit_par  000000000086A3DB  tbk_string_stack_     Unknown  Unknown
nameiii_64bit_par  0000000000812844  Unknown               Unknown  Unknown
nameiii_64bit_par  0000000000812656  tbk_stack_trace       Unknown  Unknown
nameiii_64bit_par  00000000007A5709  for__issue_diagno     Unknown  Unknown
nameiii_64bit_par  00000000007ABAB6  for__signal_handl     Unknown  Unknown
libpthread-2.17.s  00002B5D9B2A9630  Unknown               Unknown  Unknown
mca_pmix_pmix2x.s  00002B5D9FC473A7  pmix2x_value_unlo     Unknown  Unknown
mca_pmix_pmix2x.s  00002B5D9FC4700F  pmix2x_event_hdlr     Unknown  Unknown
mca_pmix_pmix2x.s  00002B5D9FC61198  pmix_invoke_local     Unknown  Unknown
mca_pmix_pmix2x.s  00002B5D9FC66177  Unknown               Unknown  Unknown
mca_pmix_pmix2x.s  00002B5D9FC659BA  Unknown               Unknown  Unknown
mca_pmix_pmix2x.s  00002B5D9FCD36EC  pmix_ptl_base_pro     Unknown  Unknown
libopen-pal.so.40  00002B5D9DC07782  opal_libevent2022     Unknown  Unknown
mca_pmix_pmix2x.s  00002B5D9FCA5A22  Unknown               Unknown  Unknown
libpthread-2.17.s  00002B5D9B2A1EA5  Unknown               Unknown  Unknown
libc-2.17.so       00002B5D9B5B4B0D  clone                 Unknown  Unknown

This could not be reproduced when re-running it a further 10 times (actually slightly more, given one or two interactively outside of this retry loop).

The following comment was found by googling for the error message: https://github.com/open-mpi/ompi/issues/5336#issuecomment-400490216 , so may be an issue with a similar cause, and refers to failure rates in the region of 1-3%. So probably I have not retried it enough times to reproduce this intermittent failure. I will guess here that this is an issue independent of the one that this issue is about, and one that we ought to look at, but I will create a separate issue about it and then otherwise ignore it for the purpose of the current issue.

alaniwi commented 2 years ago

Seems to be related to the longitude value. Tried a couple of start dates and that didn't make a difference. 45E is fine, 50E or 60E there is no output. (50W was okay.) Will have to ask Andrew.

alaniwi commented 1 year ago

It turns out that the hard-coded model domain is the issue here. Need the same fix as at https://github.com/cedadev/swallow/issues/58#issuecomment-1307157651 and then change the hard-coded values to be global - see https://github.com/cedadev/swallow/blob/d62202860204e25b4c9ce9b36db63deb46ae5e5c/swallow/processes/create_name_inputs/make_traj_input.py#L14-L17 , but still pass them through from the Python. This should fix the issue, while retaining the option of adding user-specified computational domain in future (as is already implemented for the general forward / air history runs).

alaniwi commented 1 year ago

This was fixed. Here's a plot from the previously broken test case.

image