GeoscienceAustralia / eqrm

Automatically exported from code.google.com/p/eqrm
Other
5 stars 4 forks source link

Crash bug when running EQRM. Number of events requested does not equal number written. #23

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Execute the "run_rhe.sh" script in 
/nas/gemd/ehp/georisk_earthquake/EQRM/sandpits/dburbidg/python_eqrm/EQRM/trunk/c
ase_studies/national/regional
2. Wait for a few hours.

What is the expected output? What do you see instead?
I expected successful completion. Instead I saw this:

...
P27: Waiting for P0 to generate event set
P9: Waiting for P0 to generate event set

Traceback (most recent call last):
  File "runhaz.py", line 40, in <module>
    run_model()
  File "runhaz.py", line 36, in run_model
    analysis.main(run,True,compress_output)
  File "/nas/gemd/ehp/georisk_earthquake/EQRM/sandpits/dburbidg/python_eqrm/EQRM/trunk/eqrm_core/eqrm_code/analysis.py", line 167, in main
    (event_set, event_activity, source_model) = create_event_set(eqrm_flags, parallel)
  File "/nas/gemd/ehp/georisk_earthquake/EQRM/sandpits/dburbidg/python_eqrm/EQRM/trunk/eqrm_core/eqrm_code/event_set.py", line 1535, in create_event_set
    generate_event_set(parallel, eqrm_flags)
  File "/nas/gemd/ehp/georisk_earthquake/EQRM/sandpits/dburbidg/python_eqrm/EQRM/trunk/eqrm_core/eqrm_code/event_set.py", line 1394, in generate_event_set
    eqrm_flags.prob_number_of_events_in_zones)
  File "/nas/gemd/ehp/georisk_earthquake/EQRM/sandpits/dburbidg/python_eqrm/EQRM/trunk/eqrm_core/eqrm_code/event_set.py", line 633, in generate_synthetic_events
    width=width)
  File "/nas/gemd/ehp/georisk_earthquake/EQRM/sandpits/dburbidg/python_eqrm/EQRM/trunk/eqrm_core/eqrm_code/event_set.py", line 388, in create
    rupture_centroid_lon)
  File "/nas/gemd/ehp/georisk_earthquake/EQRM/sandpits/dburbidg/python_eqrm/EQRM/trunk/eqrm_core/eqrm_code/event_set.py", line 132, in __init__
    self.area = area
  File "/nas/gemd/ehp/georisk_earthquake/EQRM/sandpits/dburbidg/python_eqrm/EQRM/trunk/eqrm_core/eqrm_code/event_set.py", line 183, in <lambda>
    lambda self, value: self._set_file_array('area', value))
  File "/nas/gemd/ehp/georisk_earthquake/EQRM/sandpits/dburbidg/python_eqrm/EQRM/trunk/eqrm_core/eqrm_code/file_store.py", line 139, in _set_file_array
    self._set_numpy_binary_array(name, array)
  File "/nas/gemd/ehp/georisk_earthquake/EQRM/sandpits/dburbidg/python_eqrm/EQRM/trunk/eqrm_core/eqrm_code/file_store.py", line 129, in _set_numpy_binary_array
    save(filename, array)
  File "/usr/local/lib/python2.5/site-packages/numpy/lib/npyio.py", line 408, in save
    format.write_array(fid, arr)
  File "/usr/local/lib/python2.5/site-packages/numpy/lib/format.py", line 409, in write_array
    array.tofile(fp)
ValueError: 16903655 requested and 10672118 written

What version of the product are you using? On what operating system?
1886 on rhe-compute1

Please provide any additional information below.

I've attached the log file from node 0 and a dump of the output of the run.

Original issue reported on code.google.com by David.Bu...@ga.gov.au on 5 Mar 2012 at 4:55

Attachments:

GoogleCodeExporter commented 9 years ago
Again it looks like /tmp is being used and filling up as it is the most likely 
candidate.

This exception is being thrown during event set generation when the Event_Set 
object is created for the first time. To check the behaviour, I put a sleep at 
the point the exception is thrown and ran a small scenario...

$ ls -l /tmp/ | grep u78240 | grep npy
-rw-------  1 u78240   gemd     112 Mar  5 16:40 event_set.area.WEDZ1j.npy
-rw-------  1 u78240   gemd     112 Mar  5 16:40 event_set.azimuth.GpJ9K2.npy
-rw-------  1 u78240   gemd     112 Mar  5 16:40 event_set.depth.0fXRCd.npy
-rw-------  1 u78240   gemd     112 Mar  5 16:40 
event_set.depth_to_top.AtAtKK.npy
-rw-------  1 u78240   gemd     112 Mar  5 16:40 event_set.dip.x1RZfu.npy
-rw-------  1 u78240   gemd     112 Mar  5 16:40 event_set.event_id._BcFNV.npy
-rw-------  1 u78240   gemd     112 Mar  5 16:40 event_set.fault_type.UXwPHx.npy
-rw-------  1 u78240   gemd     112 Mar  5 16:40 
event_set.fault_width.6fwU5a.npy
-rw-------  1 u78240   gemd     112 Mar  5 16:40 event_set.length.B2o22o.npy
-rw-------  1 u78240   gemd     112 Mar  5 16:40 event_set.ML.zwCipU.npy
-rw-------  1 u78240   gemd     112 Mar  5 16:40 event_set.Mw.53EeUX.npy
-rw-------  1 u78240   gemd     112 Mar  5 16:40 
event_set.rupture_centroid_lat.DOC6Jj.npy
-rw-------  1 u78240   gemd     112 Mar  5 16:40 
event_set.rupture_centroid_lon.fz0myD.npy
-rw-------  1 u78240   gemd     112 Mar  5 16:40 
event_set.rupture_centroid_x.AA3ft3.npy
-rw-------  1 u78240   gemd     112 Mar  5 16:40 
event_set.rupture_centroid_y.0wI4bc.npy
-rw-------  1 u78240   gemd     112 Mar  5 16:40 
event_set.source_zone_id.TIOZmU.npy
-rw-------  1 u78240   gemd     112 Mar  5 16:40 
event_set.trace_end_lat.28htgh.npy
-rw-------  1 u78240   gemd     112 Mar  5 16:40 
event_set.trace_end_lon.hHZnJ4.npy
-rw-------  1 u78240   gemd     112 Mar  5 16:40 
event_set.trace_start_lat.5nQ3bx.npy
-rw-------  1 u78240   gemd     112 Mar  5 16:40 
event_set.trace_start_lon.D6diIW.npy
-rw-------  1 u78240   gemd     112 Mar  5 16:40 event_set.width.G6Yb5G.npy

The temporary Event_Set objects that are created during generation do not have 
the file store directory passed into it so they default to writing the arrays 
in /tmp. Will ensure that the dir is passed through in these cases.

Original comment by b...@girorosso.com on 5 Mar 2012 at 5:54

GoogleCodeExporter commented 9 years ago
Revision 985 ensures that all Event_Set objects have a storage dir passed into 
it.

Original comment by b...@girorosso.com on 5 Mar 2012 at 6:21

GoogleCodeExporter commented 9 years ago
Raising an exception in the init method of File_Store if the directory is None 
caught a  few instances of the Event_Activity object not passing through the 
configured directory. Resolving this also.

Original comment by b...@girorosso.com on 5 Mar 2012 at 10:41

GoogleCodeExporter commented 9 years ago
Revision 986 ensures that Event_Activity objects have a storage dir passed into 
it.

Original comment by b...@girorosso.com on 5 Mar 2012 at 11:16

GoogleCodeExporter commented 9 years ago
This particular issue should be resolved.

Original comment by b...@girorosso.com on 5 Mar 2012 at 11:22