Closed raoanirudh closed 7 years ago
How did you installed oq? Using:
requirements-py*-macos.txt
setup.py
It looks like a fork in a non fork-safe area of the code. We had similar issues in the past caused by Apple's customized libraries.
I can confirm the issue with both Python 2.7 and 3.5 using either dependencies from pypi
and our wheels.
@micheles it works fine in all cases using oq run
. It should use futures in the same way, am I wrong?
$ oq run test/event_based_risk/inputs/case_master/job.ini
[cut]
[2017-10-18 21:03:48,222 #41 INFO] Sent 545.79 KB of data in 2 task(s)
[2017-10-18 21:03:48,247 #41 INFO] Submitting 2 "event_based_risk#2" tasks
[2017-10-18 21:03:48,249 #41 INFO] Sent 297.74 KB of data in 2 task(s)
[2017-10-18 21:03:55,500 #41 INFO] event_based_risk#1 50%
[2017-10-18 21:04:06,112 #41 INFO] event_based_risk#1 100%
[2017-10-18 21:04:06,271 #41 INFO] Received 11.86 MB of data, maximum per task 11.26 MB
[2017-10-18 21:04:06,273 #41 INFO] event_based_risk#2 50%
[2017-10-18 21:04:06,295 #41 INFO] event_based_risk#2 100%
[2017-10-18 21:04:06,318 #41 INFO] Received 605.08 KB of data, maximum per task 547.94 KB
[2017-10-18 21:04:06,321 #41 INFO] Generated 8.19 MB of GMFs
[2017-10-18 21:04:15,245 #41 INFO] Instantiating LossRatiosGetters
[2017-10-18 21:04:15,275 #41 INFO] Submitting 7 "build_curves_maps" tasks
[2017-10-18 21:04:15,288 #41 INFO] Sent 10.34 MB of data in 7 task(s)
[2017-10-18 21:04:15,361 #41 INFO] build_curves_maps 14%
[2017-10-18 21:04:15,473 #41 INFO] build_curves_maps 28%
[2017-10-18 21:04:15,520 #41 INFO] build_curves_maps 42%
[2017-10-18 21:04:15,545 #41 INFO] build_curves_maps 57%
[2017-10-18 21:04:15,598 #41 INFO] build_curves_maps 71%
[2017-10-18 21:04:15,680 #41 INFO] build_curves_maps 85%
[2017-10-18 21:04:15,690 #41 INFO] build_curves_maps 100%
[2017-10-18 21:04:15,696 #41 INFO] Received 29.16 KB of data, maximum per task 4.17 KB
[2017-10-18 21:04:15,724 #41 INFO] Total time spent: 28.684799909591675 s
[2017-10-18 21:04:15,724 #41 INFO] Memory allocated: 53.34 MB
See the output with hdfview /Users/jenkins/oqdata/calc_41.hdf5
The only difference between oq run
and oq engine
on a single machine is the logging (oq engine uses the DbServer and logs on the db). Could this be an issue with sqlite? Remember that we had an issue on macOS. Perhaps is still there but less visible than before. My hint come from this line in the segfault report:
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libdispatch.dylib 0x00007fffc13ea749 _dispatch_queue_push + 171
The issue with sqlite was in libdispatch.dylib if I remember correctly.
Anirudh, can you run the same test with engine 2.6? My bet is the that the problem is there too. Engine 2.5 should be safe, though.
Yes @micheles that was my suspect too. I'm now compiling the linux standalone package on macOS which embeds its own copy of Sqlite and Python compiled against it, to see it something changes.
@micheles @raoanirudh Engine 2.6.0 works for me (Python 3.5 + our wheels, macOS 10.11). Should we considering this a regression blocking the release?
I've been able to bisect code and after some trials this is the PR that breaks the code: https://github.com/gem/oq-engine/pull/3093:
up to 9e46030d76779af41c396d3517061acf981bf73c (merge of #3088) it works from 8a5f6376e4261f75c95ff87656a95c644bcf1b43 (merge of #3093) it breaks
@raoanirudh could you try this branch please? https://github.com/gem/oq-engine/tree/revert-3093
@daniviga Works fine with the branch revert-3093
I confirm that the issue is in check_obsolete_version
. The only thing done there is a call to requests. However by replacing requests with the standard library
req = Request(OQ_API + '/engine/latest', headers=headers)
data = urlopen(req, timeout=1).read() # bytes
tag_name = json.loads(decode(data))['tag_name']
the segfault is still there :-( The problem is the call to urlopen
even if it seems absurd.
The workaround I suggest is to change oq_distribute from futures to zmq in the file openquake.cfg and everything will work. I would not block the release for this issue. It would be interesting to see if there are other calculations affected by this issue, especially real examples.
Running the test case event_based_risk/case_master without the
--nd
flag leads to a segmentation fault on macOS.Replicated on two different systems: OS X 10.12.6 + Python 3.5.3 OS X 10.10.5 + Python 2.7.12
Engine log
System log