OriginTrail / ot-node

OriginTrail Decentralized Knowledge Graph network node
https://origintrail.io
Apache License 2.0
208 stars 75 forks source link

Asset Sync - Java RuntimeException: java.nio.file.NoSuchFileException: /root/ot-node/blazegraph.jnl #3344

Closed CosmiCloud closed 1 month ago

CosmiCloud commented 1 month ago

Issue description

Sparql queries to blazegraph cause Java runtime errors frequently within node logs during asset sync.

The node itself does not fatally crash but the syncing process could be negatively impacted by such errors.

There are over 20k Java Runtime errors in the last 3 hours on a single node.

Expected behavior

During asset sync, the queries ran during the sync process DO NOT cause java runtime errors.

Actual behavior

During asset sync, the queries ran during the sync process DO cause java runtime errors.

Steps to reproduce the problem

  1. Add numerous (10 or more) paranets to a nodes AssetSync configuration.
  2. Eventually java runtime errors will be logged.

Specifications

LOGS

java 2 error.txt java 1 error.txt journal grep count Node Logs: v8test6-otnode.log.partab.gz

Disclaimer

Please be aware that the issue reported on a public repository allows everyone to see your node logs, node details, and contact details. If you have any sensitive information, feel free to share it by sending an email to tech@origin-trail.com.

Mihajlo-Pavlovic commented 1 month ago

Does blazgraph.jnl exist on your node?

This is not on your node, in the logs you sent there is this:

Received NACK response from node during v1_0_0GetInitCommand. Error message: Invalid SPARQL endpoint response from http://localhost:9999/blazegraph/namespace/public-current/sparql (HTTP status 500):

This is response from node you set get request from your node, it's error that explains why node you sent request give you negative response

botnumberseven commented 1 month ago

@Mihajlo-Pavlovic So the remote node (not our node) blazegraph returns HTTP 500 which we see in our local node logs, correct? If Yes, then the issue is still there, it's just produced by a different node. Or maybe I misunderstand the failure mechanism here.

CosmiCloud commented 1 month ago

@Mihajlo-Pavlovic So the remote node would be available to send a request to even though they dont have a blazegraph.jnl? Wouldnt the node fail to initialize if they did not have a blazegraph.jnl and not start? or would the node still be responsive to my requests even though it failed to initialize?

botnumberseven commented 1 month ago

@Mihajlo-Pavlovic The nodes which have this error also have publishers running on it. A script which publish assets thru this node for a few wallets. The moment I stop asset publishing process, these blazegraph errors stop as well. Does it fit well into you understanding the the failure mechanism?

Larsk97 commented 1 month ago

Seeing lots of this errors trying to sync a little paranet from which I have submittet 3494 assets into today.