Volue-Public / energy-mesh-python

A Python API able to communicate with Volue Energy's Mesh server.
Other
10 stars 0 forks source link

Bug when running inflow calculation #516

Open asmunds opened 11 hours ago

asmunds commented 11 hours ago

Mesh Python SDK version: 1.9.0 Mesh version: 2.15.1+7 Python version: 3.12.1 Operating System (Windows, Ubuntu, etc.): Windows

Environment: dev

When running the following command:

session.run_inflow_calculation(
                    model="Agder",
                    area="Norge",
                    water_course="Otra",
                    start_time=datetime.now().astimezone(timezone.utc) - timedelta(days=30=,
                    end_time=datetime.now().astimezone(timezone.utc),
                    return_datasets=False,
                    resolution=timedelta(minutes=60),
                )

I get the following error message:

2024-10-07 12:28:10,291 - __main__ - ERROR - MainThread - fatal exception (code=0xc0000005):
stack trace:
0> Powel_Mesh_Common!Powel::Common::FatalVectoredExceptionHandler+0x5B
1> ntdll!RtlInitializeCriticalSectionAndSpinCount+0x1C6
2> ntdll!RtlWalkFrameChain+0x1119
3> ntdll!KiUserExceptionDispatcher+0x2E
4> OraOCIEI!slgetohorabaseconfig+0x2509
5> OraOCIEI!slgetohorabaseconfig+0x1B25
6> OraOCIEI!slgetohorabaseconfig+0x1748
7> OraOCIEI!slgetohorabaseconfig+0x15CC
8> OraOCIEI!kzssqlname+0x2B665
9> OraOCIEI!dbgtfFilePop+0xB5BE
10> OraOCIEI!dbgtfFilePop+0xB194
11> OraOCIEI!dbgecPop+0x179D5
12> OraOCIEI!dbgecPop+0x18234
13> OraOCIEI!dbgfcsIlcsRegister+0x4943
14> OraOCIEI!dbgfcsIlcsRegister+0x2E51
15> OraOCIEI!dbgePostErrorDirect+0x11FB
16> OraOCIEI!dbgePostErrorDirect+0x2D3
17> OraOCIEI!kpeDbgHdlPostop+0x1C90
18> OraOCIEI!slgetohorabaseconfig+0x598E
19> KERNELBASE!UnhandledExceptionFilter+0x1BC
20> ntdll!memset+0x1C40
21> ntdll!_C_specific_handler+0x96
22> ntdll!_chkstk+0x11F
23> ntdll!RtlWalkFrameChain+0x14BF
24> ntdll!RtlRaiseException+0x316
25> KERNELBASE!RaiseException+0x69
26> OraOCIEI!slgetohorabaseconfig+0x5B1E
27> ucrtbase!raise+0x1DD
28> ucrtbase!abort+0x31
29> ucrtbase!terminate+0x1F
30> VCRUNTIME140_1+0x1911
31> VCRUNTIME140_1+0x218F
32> VCRUNTIME140_1+0x21E9
33> VCRUNTIME140_1!_CxxFrameHandler4+0xA9
34> ntdll!_chkstk+0x11F
35> ntdll!RtlWalkFrameChain+0x14BF
36> ntdll!RtlRaiseException+0x316
37> KERNELBASE!RaiseException+0x69
38> VCRUNTIME140!CxxThrowException+0x97
39> sim_core_12!sim::core::solver::restart+0x3351
40> sim_core_12!sim::core::solver_t::operator()+0x45
41> Powel_Mesh_Server!volue::mesh::grpc::datatransfer::v1alpha::ExportTimeseriesRequest::query_ids+0x1B473
42> Powel_Mesh_Server!volue::mesh::grpc::datatransfer::v1alpha::ExportTimeseriesRequest::query_ids+0x1CD65
43> Powel_Mesh_Server!volue::mesh::grpc::datatransfer::v1alpha::ExportTimeseriesRequest::query_ids+0x180D5
44> Powel_Mesh_Server!??$CreateMaybeMessage@VRollbackResponse@v1alpha@session@grpc@mesh@volue@@$$V@Arena@protobuf@google@@CAPEAVRollbackResponse@v1alpha@session@grpc@mesh@volue@@PEAV012@@Z+0x4ECBF
45> Powel_Mesh_Server!??$CreateMaybeMessage@VRollbackResponse@v1alpha@session@grpc@mesh@volue@@$$V@Arena@protobuf@google@@CAPEAVRollbackResponse@v1alpha@session@grpc@mesh@volue@@PEAV012@@Z+0x50B03
46> ucrtbase!o_exp+0x5A
47> KERNEL32!BaseThreadInitThunk+0x14
48> ntdll!RtlUserThreadStart+0x21
2024-10-07 12:28:13,536 - __main__ - ERROR - MainThread - Failed to run inflow calculation for Otra:
<_MultiThreadedRendezvous of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "IOCP/Socket: Connection reset (An existing connection was forcibly closed by the remote host.
 -- 10054)"
        debug_error_string = "UNKNOWN:Error received from peer ipv4:10.98.42.15:50051 {grpc_message:"IOCP/Socket: Connection reset (An existing connection was forcibly closed by the remote host.\r\n -- 10054)", grpc_status:14, created_time:"2024-10-07T10:28:13.5242622+00:00"}"
>
Traceback (most recent call last):
  File "C:\Users\103281\repos\nwb_model/src/scheduledrun.py", line 75, in start_nwb_run
    for response in session.run_inflow_calculation(
  File "c:\Users\103281\repos\nwb_model\src\.env\Lib\site-packages\volue\mesh\_connection.py", line 499, in run_inflow_calculation   
    for response in self.hydsim_service.RunInflowCalculation(request):
  File "c:\Users\103281\repos\nwb_model\src\.env\Lib\site-packages\grpc\_channel.py", line 543, in __next__
    return self._next()
           ^^^^^^^^^^^^
  File "c:\Users\103281\repos\nwb_model\src\.env\Lib\site-packages\grpc\_channel.py", line 969, in _next
    raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "IOCP/Socket: Connection reset (An existing connection was forcibly closed by the remote host.
 -- 10054)"
        debug_error_string = "UNKNOWN:Error received from peer ipv4:10.98.42.15:50051 {grpc_message:"IOCP/Socket: Connection reset (An existing connection was forcibly closed by the remote host.\r\n -- 10054)", grpc_status:14, created_time:"2024-10-07T10:28:13.5242622+00:00"}"
>
2024-10-07 12:28:33,573 - __main__ - ERROR - MainThread - An error occurred: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "failed to connect to all addresses; last error: UNKNOWN: ipv4:10.98.42.15:50051: tcp handshaker shutdown"
        debug_error_string = "UNKNOWN:Error received from peer  {grpc_message:"failed to connect to all addresses; last error: UNKNOWN: ipv4:10.98.42.15:50051: tcp handshaker shutdown", grpc_status:14, created_time:"2024-10-07T10:28:33.5607869+00:00"}"
>
Traceback (most recent call last):
  File "C:\Users\103281\repos\nwb_model/src/scheduledrun.py", line 147, in <module>
    main()
  File "C:\Users\103281\repos\nwb_model/src/scheduledrun.py", line 135, in main
    start_nwb_run(development=development, watercourses=watercourses, comment=args.comment)
  File "C:\Users\103281\repos\nwb_model/src/scheduledrun.py", line 54, in start_nwb_run
    with mesh_client.mesh_connection.create_session() as session:
  File "c:\Users\103281\repos\nwb_model\src\.env\Lib\site-packages\volue\mesh\_connection.py", line 99, in __exit__
    self.close()
  File "c:\Users\103281\repos\nwb_model\src\.env\Lib\site-packages\volue\mesh\_connection.py", line 130, in close
    self.session_service.EndSession(_to_proto_guid(self.session_id))
  File "c:\Users\103281\repos\nwb_model\src\.env\Lib\site-packages\grpc\_channel.py", line 1181, in __call__
    return _end_unary_response_blocking(state, call, False, None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\103281\repos\nwb_model\src\.env\Lib\site-packages\grpc\_channel.py", line 1006, in _end_unary_response_blocking     
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "failed to connect to all addresses; last error: UNKNOWN: ipv4:10.98.42.15:50051: tcp handshaker shutdown"
        debug_error_string = "UNKNOWN:Error received from peer  {grpc_message:"failed to connect to all addresses; last error: UNKNOWN: ipv4:10.98.42.15:50051: tcp handshaker shutdown", grpc_status:14, created_time:"2024-10-07T10:28:33.5607869+00:00"}"
>
2024-10-07 12:28:33,578 - opentelemetry.attributes - WARNING - MainThread - Invalid type _RPCState for attribute 'exception.message' value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types

Is seems like the calculation is going fine, and crashes in the end.

Other runs with other watercourses works as expected with the exact same command. Also, this command with the same watercourse worked fine until this two days ago. (It started failing on saturday around 17:00 CET (we run in dev environment every hour)).

log messages leading up to the error (excluding debug - let me know if you need):

2024-10-07 12:28:07,443 - __main__ - INFO - MainThread - [nwb] Starting historical inflow calculation for watercourse Otra from area Norge in session {ABD61246-AB0F-4842-8457-2C17F0DC8AB8}
2024-10-07 12:28:07,448 - __main__ - INFO - MainThread - [nwb] HydSim version 12.8.0-1 for session:{ABD61246-AB0F-4842-8457-2C17F0DC8AB8} area:Norge watercourse:Otra
2024-10-07 12:28:07,449 - __main__ - INFO - MainThread - [nwb] Time interval [2024-09-07 10:28, 2024-10-07 10:28) resolution 01:00 for session:{ABD61246-AB0F-4842-8457-2C17F0DC8AB8} area:Norge watercourse:Otra
2024-10-07 12:28:07,519 - __main__ - INFO - MainThread - [nwb] Found 1 topology invariant intervals for session:{ABD61246-AB0F-4842-8457-2C17F0DC8AB8} area:Norge watercourse:Otra
2024-10-07 12:28:07,612 - __main__ - INFO - MainThread - [nwb] Loaded models for 1 topology invariant intervals for session:{ABD61246-AB0F-4842-8457-2C17F0DC8AB8} area:Norge watercourse:Otra in cpu-user:156ms cpu-kernel:0ms
2024-10-07 12:28:07,613 - __main__ - INFO - MainThread - [nwb] Calculating topology invariant interval [2024-09-07 10:28, 2024-10-07 10:28) for session:{ABD61246-AB0F-4842-8457-2C17F0DC8AB8} area:Norge watercourse:Otra
2024-10-07 12:28:08,999 - __main__ - WARNING - MainThread - [hyd-core] Missing values for b_Beihølen_Venneslafjorden L3, with id d376958a-a93d-4c89-8c54-82408dc40de9, and symbol symbol-resource_gate_38-flow
2024-10-07 12:28:09,121 - __main__ - WARNING - MainThread - [hyd-core] Missing values for b_Nomeland_dam_Beihølen L3, with id c5ead456-e840-4a86-822a-655decca6048, and symbol symbol-resource_gate_50-flow
2024-10-07 12:28:09,306 - __main__ - WARNING - MainThread - [hyd-core] Missing values for b_Inntak_Fennefoss_Kilefjorden L2, with id 00e156bf-4d93-4b68-b8b6-f1c275c68dc8, and symbol symbol-resource_gate_74-flow
2024-10-07 12:28:09,306 - __main__ - WARNING - MainThread - [hyd-core] Missing values for b_Inntak_Fennefoss_Kilefjorden L4, with id b237ab25-e56d-4097-94cf-d46521cfd512, and symbol symbol-resource_gate_75-flow
2024-10-07 12:28:09,306 - __main__ - WARNING - MainThread - [hyd-core] Missing values for b_Inntak_Fennefoss_Kilefjorden L1, with id 79fe7f13-42a7-4acf-8e33-6560bdcc82db, and symbol symbol-resource_gate_76-flow
2024-10-07 12:28:09,306 - __main__ - WARNING - MainThread - [hyd-core] Missing values for b_Inntak_Fennefoss_Kilefjorden L3, with id f4851207-42dc-4eec-9c50-155101962d1c, and symbol symbol-resource_gate_77-flow
2024-10-07 12:28:09,337 - __main__ - WARNING - MainThread - [hyd-core] Missing values for w_Ormsavatn_Vatnedalsvatn L1, with id fb662454-d79b-4d49-9e44-c553db7f258d, and symbol symbol-resource_gate_88-flow
2024-10-07 12:28:09,739 - __main__ - INFO - MainThread - [nwb] Read flow input for session:{ABD61246-AB0F-4842-8457-2C17F0DC8AB8} area:Norge watercourse:Otra in cpu-user:1516ms cpu-kernel:62ms
2024-10-07 12:28:09,987 - __main__ - INFO - MainThread - [nwb] Computed small scheduled input for session:{ABD61246-AB0F-4842-8457-2C17F0DC8AB8} area:Norge watercourse:Otra in cpu-user:250ms cpu-kernel:0ms

I see there are some warnings. However, I see the same warnings in other watercourses that don't crash like this.

simia commented 11 hours ago

Hi! As I understand this error causes mesh crash? In this case I'd suggest filing (through proper channels) issue against Mesh and providing additional information in that issue rather than in this Public repo: Could you provide mesh process core dump? Could you also provide us with Mesh log file?

asmunds commented 11 hours ago

No, Mesh does not crash. Or, at least not the whole server. Do you mean a server crash, or that the crash is on the Mesh side of things, so its not really a Python SDK issue?

simia commented 10 hours ago

No, Mesh does not crash. Or, at least not the whole server. Do you mean a server crash, or that the crash is on the Mesh side of things, so its not really a Python SDK issue?

I believe that the problem is on the Mesh side. Not sure yet what exactly is the problem though.