GeoscienceAustralia / eqrm

Automatically exported from code.google.com/p/eqrm
Other
5 stars 4 forks source link

Event generation lat/lon creation takes too long #40

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Run a simulation that will create a very large amount of events
2. A random distribution of event set lat/lon takes a long time to produce
3. See attached log from David B

What is the expected output? What do you see instead?
To be closer in order of magnitude in time to the generation of other evetn set 
attribute vectors.

Please use labels and text to provide additional information.

Original issue reported on code.google.com by b...@girorosso.com on 19 Apr 2012 at 3:22

Attachments:

GoogleCodeExporter commented 9 years ago
The generation of lat/lons for events is done by 
polygon.populate_geo_coord_polygon. The following algorithm is used

1. Determine the boundary x and y values from the given polygon
2. Generate a random x and y value within the boundary x and y
3. If it is inside the polygon and outside the excluded polygons then keep
4. Keep generating points until the requested number of points is filled

This can be improved by vectorising this algorithm:
2. Generate a random vector x and y of size of requested number of points 
within the boundary x and y
3. Find all indices that are inside the polygon and outside the excluded 
polygons
4. If the indices to keep are less than the requested number of points repeat 
steps 2 and 3
5. Return a slice of the points vector that equals the requested number of 
points

Original comment by b...@girorosso.com on 19 Apr 2012 at 3:30

GoogleCodeExporter commented 9 years ago
Results from a timing test with logging showing the timing of the operations 
between the original algorithm and the vectorised one:

2012-04-19 12:27:03,733 DEBUG                    event_set:518 |Generating 
events for source 9 of 43
2012-04-19 12:27:03,734 DEBUG                    event_set:528 |Number of 
events = 251026, range = [367508 367509 367510 ..., 618531 618532 618533]
...
2012-04-19 12:27:04,353 DEBUG                      polygon:561 
|populate_geo_coord_polygon start
2012-04-19 12:27:04,384 DEBUG                      polygon:570 
|populate_geo_coord_polygon randoms populated
2012-04-19 12:27:04,384 DEBUG                      polygon:586 
|populate_geo_coord_polygon outer extent of polygon found
...
2012-04-19 12:27:04,385 DEBUG                      polygon:594 
|populate_geo_coord_polygon generate points started
2012-04-19 12:27:54,831 DEBUG                      polygon:621 
|populate_geo_coord_polygon generate points completed
...
2012-04-19 12:27:54,832 DEBUG                      polygon:624 
|populate_geo_coord_polygon vectorised generate points started
2012-04-19 12:27:59,225 DEBUG                      polygon:647 
|populate_geo_coord_polygon vectorised generate points completed
...

For > 250000 events:
Original algorithm - ~50 secs
Vectorised algorithm - ~ 4.5 secs

This achieves the goal stated above.

Original comment by b...@girorosso.com on 19 Apr 2012 at 3:36

GoogleCodeExporter commented 9 years ago

Original comment by b...@girorosso.com on 19 Apr 2012 at 3:37

GoogleCodeExporter commented 9 years ago
Some stats from David's currently 'hanging' simulation:

*Original (non-vectorised)*
2012-04-18 14:43:02,207 DEBUG                    event_set:514 |Generating 
events for source 0 of 4
2012-04-18 14:43:02,227 DEBUG                    event_set:524 |Number of 
events = 3213836, range = [      0       1       2 ..., 3213833 3213834 3213835]
2012-04-18 14:43:09,434 DEBUG                    event_set:543 |Memory: 
populate_magnitude created
2012-04-18 14:43:09,435 DEBUG                    event_set:544 |Resource usage: 
memory=1126.0MB resident=656.0MB stacksize=0.3MB
2012-04-18 18:54:36,824 DEBUG                    event_set:554 |Memory: lat,lon 
created
2012-04-18 18:54:36,825 DEBUG                    event_set:555 |Resource usage: 
memory=1605.1MB resident=1159.3MB stacksize=0.3MB
time = ~ 4h 11m

2012-04-18 18:54:38,264 DEBUG                    event_set:514 |Generating 
events for source 1 of 4
2012-04-18 18:54:38,265 DEBUG                    event_set:524 |Number of 
events = 78432, range = [3213836 3213837 3213838 ..., 3292265 3292266 3292267]
2012-04-18 18:54:38,438 DEBUG                    event_set:543 |Memory: 
populate_magnitude created
2012-04-18 18:54:38,438 DEBUG                    event_set:544 |Resource usage: 
memory=1605.1MB resident=1161.8MB stacksize=0.3MB
2012-04-18 18:57:23,988 DEBUG                    event_set:554 |Memory: lat,lon 
created
2012-04-18 18:57:23,988 DEBUG                    event_set:555 |Resource usage: 
memory=1605.1MB resident=1161.8MB stacksize=0.3MB
time = ~ 2m 45s

2012-04-18 18:57:24,022 DEBUG                    event_set:514 |Generating 
events for source 2 of 4
2012-04-18 18:57:24,026 DEBUG                    event_set:524 |Number of 
events = 1321801, range = [3292268 3292269 3292270 ..., 4614066 4614067 4614068]
2012-04-18 18:57:26,926 DEBUG                    event_set:543 |Memory: 
populate_magnitude created
2012-04-18 18:57:26,927 DEBUG                    event_set:544 |Resource usage: 
memory=1605.1MB resident=1161.8MB stacksize=0.3MB
2012-04-18 20:15:38,539 DEBUG                    event_set:554 |Memory: lat,lon 
created
2012-04-18 20:15:38,540 DEBUG                    event_set:555 |Resource usage: 
memory=1605.1MB resident=1161.8MB stacksize=0.3MB
time = ~ 1h 18m

*With changes (vectorised)*
2012-04-19 14:49:53,810 DEBUG                    event_set:518 |Generating 
events for source 0 of 4
2012-04-19 14:49:53,836 DEBUG                    event_set:528 |Number of 
events = 3213836, range = [      0       1       2 ..., 3213833 3213834 3213835]
2012-04-19 14:50:01,649 DEBUG                    event_set:547 |Memory: 
populate_magnitude created
2012-04-19 14:50:01,650 DEBUG                    event_set:548 |Resource usage: 
memory=1108.2MB resident=702.9MB stacksize=0.3MB
2012-04-19 14:52:53,649 DEBUG                    event_set:558 |Memory: lat,lon 
created
2012-04-19 14:52:53,667 DEBUG                    event_set:559 |Resource usage: 
memory=1981.2MB resident=1602.2MB stacksize=0.3MB
time = 2m 25s

2012-04-19 14:52:55,493 DEBUG                    event_set:518 |Generating 
events for source 1 of 4
2012-04-19 14:52:55,494 DEBUG                    event_set:528 |Number of 
events = 78432, range = [3213836 3213837 3213838 ..., 3292265 3292266 3292267]
2012-04-19 14:52:55,668 DEBUG                    event_set:547 |Memory: 
populate_magnitude created
2012-04-19 14:52:55,669 DEBUG                    event_set:548 |Resource usage: 
memory=1981.2MB resident=1602.4MB stacksize=0.3MB
2012-04-19 14:52:58,542 DEBUG                    event_set:558 |Memory: lat,lon 
created
2012-04-19 14:52:58,543 DEBUG                    event_set:559 |Resource usage: 
memory=1981.2MB resident=1602.4MB stacksize=0.3MB
time = 3s

2012-04-19 14:52:58,588 DEBUG                    event_set:518 |Generating 
events for source 2 of 4
2012-04-19 14:52:58,592 DEBUG                    event_set:528 |Number of 
events = 1321801, range = [3292268 3292269 3292270 ..., 4614066 4614067 4614068]
2012-04-19 14:53:01,503 DEBUG                    event_set:547 |Memory: 
populate_magnitude created
2012-04-19 14:53:01,504 DEBUG                    event_set:548 |Resource usage: 
memory=1981.2MB resident=1602.4MB stacksize=0.3MB
2012-04-19 14:53:53,175 DEBUG                    event_set:558 |Memory: lat,lon 
created
2012-04-19 14:53:53,181 DEBUG                    event_set:559 |Resource usage: 
memory=1981.2MB resident=1602.4MB stacksize=0.3MB
time = 52s

Original comment by b...@girorosso.com on 19 Apr 2012 at 5:11

GoogleCodeExporter commented 9 years ago
Changes implemented in revision 1070. David B has verified that he is happy 
with the event set distribution.

Original comment by b...@girorosso.com on 20 Apr 2012 at 6:02