equinor / fmu-sumo-uploader

Upload to Sumo in the FMU context
https://fmu-sumo-uploader.readthedocs.io/en/latest/
Apache License 2.0
0 stars 2 forks source link

Discuss variable results when running Drogon #8

Closed equinor-ruaj closed 8 months ago

equinor-ruaj commented 11 months ago

Sample case with differences across iterations: link

equinor-ruaj commented 11 months ago

Test: run the same Drogon_Design case 2 times with Komodo bleeding and 2 time with Komodo stable and check consistency. Expected result: all equal.

roywilly commented 11 months ago

**SUMMARY: There is an issue in STABLE that loses some uploads due to an httpx 'timed out' error. This issue is not found in BLEEDING.

Bleeding had one other error that should be looked into: WARNING:msal_extensions.cache_lock:Process 30540 failed to create lock file WARNING:py.warnings:/prog/res/komodo/bleeding-py38-rhel7/root/lib/python3.8/site-packages/fmu/sumo/uploader/scripts/sumo_upload.py:119: UserWarning: Problem related to Sumo upload: [Errno 11] Resource temporarily unavailable **END SUMMARY

Komodo STABLE: fmu-sumo 0.5.1 fmu-sumo-sim2sumo 0.0.0 sumo-wrapper-python 0.4.0

Komodo BLEEDING: fmu-sumo 1.0.1 fmu-sumo-sim2sumo 0.1.2.dev1+g3d81a7d fmu-sumo-uploader 1.0.2.dev2+g0c326e2 sumo-wrapper-python 1.0.2.dev3+g928ba7c

STABLE -> PROD: Uploads differs. Around 20 'yellow' circles aka missing objects in both uploads. Each of these have the following in their log-files in /scratch: real-3: sumo_upload.py:119: UserWarning: Problem related to Sumo upload: timed out real-7: sumo_upload.py:119: UserWarning: Problem related to Sumo upload: timed out

BLEEDING -> DEV: First 2 uploads had no 'yellow' circles at all, and exact same number of objects. Third upload had 2 'yellow' circles: real-29: Oct 26 13:09: Metadata: [502] Bad Gateway: RADIX DEV DEPLOYMENT AT THIS TIME. real-132: Oct 26 13:35: WARNING:msal_extensions.cache_lock:Process 30540 failed to create lock file WARNING:py.warnings:/prog/res/komodo/bleeding-py38-rhel7/root/lib/python3.8/site-packages/fmu/sumo/uploader/scripts/sumo_upload.py:119: UserWarning: Problem related to Sumo upload: [Errno 11] Resource temporarily unavailable

4th upload: no yellow , bleeding_skip_sim2sumo_04

BLEEDING -> PROD: /scratch/fmu/rowh/bleeding_skip_sim2sumo_06 (1800) 1 run: No yellow

STABLE -> DEV: /scratch/fmu/rowh/stable_skip_sim2sumo_07/realization-/iter-/ real-155: sumo_upload.py:119: UserWarning: Problem related to Sumo upload: timed out

MODIFIED STABLE -> PROD to get stacktrace of 'timed out' error: /scratch/fmu/rowh/stable_skip_sim2sumo_08/realization-/iter-/ Traceback (most recent call last): File "/prog/res/komodo/2023.10.01-py38-rhel7/root/lib/python3.8/site-packages/httpcore/_exceptions.py", line 10, in map_exceptions yield File "/prog/res/komodo/2023.10.01-py38-rhel7/root/lib/python3.8/site-packages/httpcore/_backends/sync.py", line 28, in read return self._sock.recv(max_bytes) socket.timeout: timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/prog/res/komodo/2023.10.01-py38-rhel7/root/lib/python3.8/site-packages/httpx/_transports/default.py", line 60, in map_httpcore_exceptions yield File "/prog/res/komodo/2023.10.01-py38-rhel7/root/lib/python3.8/site-packages/httpx/_transports/default.py", line 218, in handle_request resp = self._pool.handle_request(req) File "/prog/res/komodo/2023.10.01-py38-rhel7/root/lib/python3.8/site-packages/httpcore/_sync/connection_pool.py", line 262, in handle_request raise exc File "/prog/res/komodo/2023.10.01-py38-rhel7/root/lib/python3.8/site-packages/httpcore/_sync/connection_pool.py", line 245, in handle_request response = connection.handle_request(request) File "/prog/res/komodo/2023.10.01-py38-rhel7/root/lib/python3.8/site-packages/httpcore/_sync/http_proxy.py", line 271, in handle_request connect_response = self._connection.handle_request( File "/prog/res/komodo/2023.10.01-py38-rhel7/root/lib/python3.8/site-packages/httpcore/_sync/connection.py", line 96, in handle_request return self._connection.handle_request(request) File "/prog/res/komodo/2023.10.01-py38-rhel7/root/lib/python3.8/site-packages/httpcore/_sync/http11.py", line 121, in handle_request raise exc File "/prog/res/komodo/2023.10.01-py38-rhel7/root/lib/python3.8/site-packages/httpcore/_sync/http11.py", line 99, in handle_request ) = self._receive_response_headers(**kwargs) File "/prog/res/komodo/2023.10.01-py38-rhel7/root/lib/python3.8/site-packages/httpcore/_sync/http11.py", line 164, in _receive_response_headers event = self._receive_event(timeout=timeout) File "/prog/res/komodo/2023.10.01-py38-rhel7/root/lib/python3.8/site-packages/httpcore/_sync/http11.py", line 200, in _receive_event data = self._network_stream.read( File "/prog/res/komodo/2023.10.01-py38-rhel7/root/lib/python3.8/site-packages/httpcore/_backends/sync.py", line 28, in read return self._sock.recv(max_bytes) File "/opt/rh/rh-python38/root/usr/lib64/python3.8/contextlib.py", line 131, in exit self.gen.throw(type, value, traceback) File "/prog/res/komodo/2023.10.01-py38-rhel7/root/lib/python3.8/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions raise to_exc(exc) from exc httpcore.ReadTimeout: timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/private/rowh/thEvnStable/root/bin/sumo_upload", line 8, in sys.exit(main()) File "/private/rowh/thEvnStable/root/lib64/python3.8/site-packages/fmu/sumo/uploader/scripts/sumo_upload.py", line 68, in main sumo_upload_main( File "/private/rowh/thEvnStable/root/lib64/python3.8/site-packages/fmu/sumo/uploader/scripts/sumo_upload.py", line 112, in sumo_upload_main e.upload(threads=threads, register_case=False) File "/private/rowh/thEvnStable/root/lib64/python3.8/site-packages/fmu/sumo/uploader/caseondisk.py", line 347, in upload upload_results = upload_files( File "/private/rowh/thEvnStable/root/lib64/python3.8/site-packages/fmu/sumo/uploader/_upload_files.py", line 60, in upload_files for r in results: File "/opt/rh/rh-python38/root/usr/lib64/python3.8/concurrent/futures/_base.py", line 619, in result_iterator yield fs.pop().result() File "/opt/rh/rh-python38/root/usr/lib64/python3.8/concurrent/futures/_base.py", line 437, in result return self.get_result() File "/opt/rh/rh-python38/root/usr/lib64/python3.8/concurrent/futures/_base.py", line 389, in get_result raise self._exception File "/opt/rh/rh-python38/root/usr/lib64/python3.8/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, **self.kwargs) File "/private/rowh/thEvnStable/root/lib64/python3.8/site-packages/fmu/sumo/uploader/_upload_files.py", line 30, in _upload_file result = file.upload_to_sumo( File "/private/rowh/thEvnStable/root/lib64/python3.8/site-packages/fmu/sumo/uploader/_fileondisk.py", line 209, in upload_to_sumo response = self._upload_metadata( File "/private/rowh/thEvnStable/root/lib64/python3.8/site-packages/fmu/sumo/uploader/_fileondisk.py", line 162, in _upload_metadata response = sumo_connection.api.post(path=path, json=self.metadata) File "/prog/res/komodo/2023.10.01-py38-rhel7/root/lib/python3.8/site-packages/sumo/wrapper/sumo_client.py", line 278, in post response = httpx.post( File "/prog/res/komodo/2023.10.01-py38-rhel7/root/lib/python3.8/site-packages/httpx/_api.py", line 304, in post return request( File "/prog/res/komodo/2023.10.01-py38-rhel7/root/lib/python3.8/site-packages/httpx/_api.py", line 100, in request return client.request( File "/prog/res/komodo/2023.10.01-py38-rhel7/root/lib/python3.8/site-packages/httpx/_client.py", line 814, in request return self.send(request, auth=auth, follow_redirects=follow_redirects) File "/prog/res/komodo/2023.10.01-py38-rhel7/root/lib/python3.8/site-packages/httpx/_client.py", line 901, in send response = self._send_handling_auth( File "/prog/res/komodo/2023.10.01-py38-rhel7/root/lib/python3.8/site-packages/httpx/_client.py", line 929, in _send_handling_auth response = self._send_handling_redirects( File "/prog/res/komodo/2023.10.01-py38-rhel7/root/lib/python3.8/site-packages/httpx/_client.py", line 966, in _send_handling_redirects response = self._send_single_request(request) File "/prog/res/komodo/2023.10.01-py38-rhel7/root/lib/python3.8/site-packages/httpx/_client.py", line 1002, in _send_single_request response = transport.handle_request(request) File "/prog/res/komodo/2023.10.01-py38-rhel7/root/lib/python3.8/site-packages/httpx/_transports/default.py", line 218, in handle_request resp = self._pool.handle_request(req) File "/opt/rh/rh-python38/root/usr/lib64/python3.8/contextlib.py", line 131, in exit self.gen.throw(type, value, traceback) File "/prog/res/komodo/2023.10.01-py38-rhel7/root/lib/python3.8/site-packages/httpx/_transports/default.py", line 77, in map_httpcore_exceptions raise mapped_exc(message) from exc httpx.ReadTimeout: timed out

roywilly commented 11 months ago

bleeding_mon_a (dev) no deployments in radix dev today: 3 'yellow' dotted realizations:

real-3: #97: Oct 30 14:14 Metadata: [404] Not Found: /scratch/fmu/rowh/bleeding_mon_a/realization-3/iter-0/share/results/polygons/topvolon--gl_faultlines_extract_postprocess.csv But both this file and its companion yml file exists on disk. A 404 is maybe a response from sumo-core?

real-37: #96: Oct 30 14:17 Metadata: [502] Bad Gateway: Filepath: /scratch/fmu/rowh/bleeding_mon_a/realization-37/iter-0/share/results/maps/therys--phit_average.gri

real-88: #100: Oct 30 14:20 Metadata: [502] Bad Gateway: /scratch/fmu/rowh/bleeding_mon_a/realization-88/iter-0/share/observations/seismic/seismic--relai_depth--20200701_20190701.segy