Open dorin100 opened 2 years ago
Hi, I can see this is assigned to @coot in the JIRA ticket, can someone please assign to him on GH too - I dont have the permissions
@dorin100 does the node stops? Could make sure the following traces are turn on:
TraceLocalHandshake
TraceErrorPolicy
TraceLocalErrorPolicy
if they are where can I find a the log file?
@coot the node does not seem to stop.
The GH jobs that ran these tests can be found here (both are using node tag vasil-testnet-v1
):
The logs for each job can be found at the above links in the Artifacts section.
All the jobs are run as default with:
TraceLocalHandshake = false
TraceErrorPolicy = true
TraceLocalErrorPolicy = true
Here all windows jobs are ok but we are using tag 1.34.1 (last released one) - https://github.com/input-output-hk/cardano-node-tests/actions/runs/2367105131
FYI: the below flags were not set for any of those sync tests (but only the windows ones are failing)
"TestEnableDevelopmentNetworkProtocols": true,
"TestEnableDevelopmentHardForkEras": true,
~@dorin100 when I look at the recent logs node_logs_shelley_qa_windows-latest
from this run. I can only see failing Plutus
scripts~
I looked at the wrong logs.
@dorin100 my Windows node synced, and I wasn't able to reproduce the issue: every 10s a cardano-cli
was able to query the tip of the node. Could you please enable the following tracers and configure them with the given minimal severities:
Tracer | Severity |
---|---|
TraceLocalMux |
Info |
TraceLocalHandshake |
Info |
TraceLocalErrorPolicy |
Debug |
TraceErrorPolicy |
Debug |
@coot - the Window job failed again but this time I used the above tracers. The results can be found here - https://github.com/input-output-hk/cardano-node-tests/actions/runs/2495501357 (look for the [node_logs_staging_windows-latest](https://github.com/input-output-hk/cardano-node-tests/suites/6926637872/artifacts/269779719)
)
this was reproducible in 1 out of 2 runs also with 1.35.0-rc3 --> job: https://github.com/input-output-hk/cardano-node-tests/actions/runs/2501998632
just for completeness sake: by reading the logs I didn't found anything wrong with the node. A few seconds before cardano-node
shutdown the it accepted the last local connection, while cardano-cli
was failing to connect very 60s. We don't have time stamps when cardano-cli
was trying to create these connections, but it seems as it's trying to do that after cardano-node
terminated.
We discovered:
DiffusionErrored (ExceptionInLinkedThread "ThreadId 34" <stderr>: hPutChar: resource vanished (Broken pipe)
which indicates that there's something wrong with the logging system.
The issue is not reproducing when we run the node with "TurnOnLogging": false
PS: the above error might indicate that the io error was propagated to the top level and the node shutdown (and it should not shut down).
The above error can be found in the logs here:
So, this is legacy tracing system.
We can fix this, by migrating Daedalus to the new tracing system -- which I believe is ready to accomodate Daedalus.
Internal/External Internal if an IOHK staff member.
Area node syncing on windows
Summary When using tag
vasil-testnet-v1
, only on Windows, in 3 out of 4 cases, the node stops responding toquery tip
and the socket is not found anymore.Steps to reproduce Steps to reproduce the behavior:
vasil-testnet-v1
Expected behavior The node should successfully sync and
query tip
should not return errors.System info (please complete the following information):
vasil-testnet-v1
vasil-testnet-v1