eclipse-ditto / ditto

Eclipse Ditto™: Digital Twin framework of Eclipse IoT - main repository
https://eclipse.dev/ditto/
Eclipse Public License 2.0
666 stars 218 forks source link

internal server error when fetching ditto connections via the api #1827

Open JeffreyThijs opened 9 months ago

JeffreyThijs commented 9 months ago

We upgraded recently from ditto 3.3.7 to 3.4.1 and we encounter the following exception when fetching all full connections via the connection api:

curl http://ditto-nginx:8080/api/2/connections --user "devops:mypassword" -s | jq
{
  "status": 500,
  "error": "connectivity:internalerror",
  "message": "There was a rare case of an unexpected internal error.",
  "description": "Please contact the service team."
}

When i add the ids-only or fields=id query parameter it correctly returns a list of the ditto connections (with only the id of course), but any additional or other fields yield the same error message. In the logs of both ditto-gateway and ditto-connectivity is see nothing that indicates that something is going wrong. Maybe https://github.com/eclipse-ditto/ditto/commit/bedaaee80d093b1b577a84bce4511ce9bb096a17 introduced this problem?

If you need more info, please let me know.

thjaeckle commented 9 months ago

Hi @JeffreyThijs

Some logs would be nice, e.g. of gateway and connectivity service. Maybe there event is a stacktrace?

JeffreyThijs commented 9 months ago

Found following stacktrace in the gateway:

kb logs -l "app.kubernetes.io/name=ditto-gateway" -f | grep -i 'ConnectionNotAccessibleException' | jq                                                                                                                                       (dev/aloxy)
{
  "@timestamp": "2023-11-30T08:49:19.926+01:00",
  "@version": "1",
  "message": "Received response <{}> from connectivity.service.",
  "logger_name": "org.eclipse.ditto.gateway.service.endpoints.actors.DittoConnectionsRetrievalActor",
  "thread_name": "Thread-52011",
  "level": "DEBUG",
  "level_value": 10000,
  "stack_trace": "org.eclipse.ditto.connectivity.model.signals.commands.exceptions.ConnectionNotAccessibleException: The Connection with ID '327bfa0b-1984-4c50-8700-9adb1579e6e1' could not be found or requester had insufficient permissions to access it.\n\tat org.eclipse.ditto.connectivity.model.signals.commands.exceptions.ConnectionNotAccessibleException$Builder.doBuild(ConnectionNotAccessibleException.java:119)\n\tat org.eclipse.ditto.connectivity.model.signals.commands.exceptions.ConnectionNotAccessibleException$Builder.doBuild(ConnectionNotAccessibleException.java:101)\n\tat org.eclipse.ditto.base.model.exceptions.DittoRuntimeExceptionBuilder.build(DittoRuntimeExceptionBuilder.java:176)\n\tat org.eclipse.ditto.base.model.exceptions.DittoRuntimeException.fromJson(DittoRuntimeException.java:245)\n\tat org.eclipse.ditto.connectivity.model.signals.commands.exceptions.ConnectionNotAccessibleException.fromJson(ConnectionNotAccessibleException.java:84)\n\tat jdk.internal.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)\n\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.base/java.lang.reflect.Method.invoke(Method.java:568)\n\tat org.eclipse.ditto.base.model.signals.AnnotationBasedJsonParsable.parse(AnnotationBasedJsonParsable.java:94)\n\tat org.eclipse.ditto.base.model.signals.AbstractJsonParsableRegistry.fromJson(AbstractJsonParsableRegistry.java:97)\n\tat org.eclipse.ditto.base.model.signals.AbstractJsonParsableRegistry.lambda$parse$0(AbstractJsonParsableRegistry.java:79)\n\tat org.eclipse.ditto.base.model.exceptions.DittoJsonException.wrapJsonRuntimeException(DittoJsonException.java:161)\n\tat org.eclipse.ditto.base.model.signals.AbstractJsonParsableRegistry.parse(AbstractJsonParsableRegistry.java:78)\n\tat org.eclipse.ditto.base.model.signals.GlobalErrorRegistry.parse(GlobalErrorRegistry.java:91)\n\tat org.eclipse.ditto.base.model.signals.GlobalErrorRegistry.parse(GlobalErrorRegistry.java:36)\n\tat org.eclipse.ditto.internal.utils.cluster.AbstractJsonifiableWithDittoHeadersSerializer.deserializeJson(AbstractJsonifiableWithDittoHeadersSerializer.java:423)\n\tat org.eclipse.ditto.internal.utils.cluster.AbstractJsonifiableWithDittoHeadersSerializer.createJsonifiableFrom(AbstractJsonifiableWithDittoHeadersSerializer.java:325)\n\tat org.eclipse.ditto.internal.utils.cluster.AbstractJsonifiableWithDittoHeadersSerializer.tryToCreateKnownJsonifiableFrom(AbstractJsonifiableWithDittoHeadersSerializer.java:296)\n\tat org.eclipse.ditto.internal.utils.cluster.AbstractJsonifiableWithDittoHeadersSerializer.fromBinary(AbstractJsonifiableWithDittoHeadersSerializer.java:273)\n\tat org.apache.pekko.serialization.Serialization.deserializeByteBuffer(Serialization.scala:279)\n\tat org.apache.pekko.remote.MessageSerializer$.deserializeForArtery(MessageSerializer.scala:111)\n\tat org.apache.pekko.remote.artery.Deserializer$$anon$3.onPush(Codecs.scala:683)\n\tat org.apache.pekko.stream.impl.fusing.GraphInterpreter.processPush(GraphInterpreter.scala:555)\n\tat org.apache.pekko.stream.impl.fusing.GraphInterpreter.processEvent(GraphInterpreter.scala:506)\n\tat org.apache.pekko.stream.impl.fusing.GraphInterpreter.execute(GraphInterpreter.scala:400)\n\tat org.apache.pekko.stream.impl.fusing.GraphInterpreterShell.runBatch(ActorGraphInterpreter.scala:662)\n\tat org.apache.pekko.stream.impl.fusing.GraphInterpreterShell$AsyncInput.execute(ActorGraphInterpreter.scala:532)\n\tat org.apache.pekko.stream.impl.fusing.GraphInterpreterShell.processEvent(ActorGraphInterpreter.scala:637)\n\tat org.apache.pekko.stream.impl.fusing.ActorGraphInterpreter.org$apache$pekko$stream$impl$fusing$ActorGraphInterpreter$$processEvent(ActorGraphInterpreter.scala:813)\n\tat org.apache.pekko.stream.impl.fusing.ActorGraphInterpreter$$anonfun$receive$1.applyOrElse(ActorGraphInterpreter.scala:831)\n\tat org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547)\n\tat org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545)\n\tat org.apache.pekko.stream.impl.fusing.ActorGraphInterpreter.aroundReceive(ActorGraphInterpreter.scala:729)\n\tat org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590)\n\tat org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557)\n\tat org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280)\n\tat org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:241)\n\tat org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:253)\n\tat java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)\n\tat java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)\n\tat java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)\n\tat java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)\n\tat java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)\n",
  "correlation-id": "d24b587e-8ec1-4581-8d17-38298b1def20"
}
thjaeckle commented 9 months ago

I fear with that little information I cannot be of any help.

BobClaerhout commented 9 months ago

So today I was trying to figure out what's going on here since not being able to fetch the existing connections is quite blocking to us. As @JeffreyThijs stated, we were able to fetch the ids only. When I did that, I got back a list of approx. 20 connections being active in ditto. I manually tried fetching them because I wanted to check the contents of the connection. I stumbled upon one with id 327bfa0b-1984-4c50-8700-9adb1579e6e1 (the ID doesn't mind but if you look closely, it's the one that is mentioned in the log line Jeffrey posted before. I went looking for it in mongoDB and saw 46 records in the connection_journal table with pid connection:327bfa0b-1984-4c50-8700-9adb1579e6e1. it showed following records:

  {
    _id: ObjectId("646f79710d77134491007ae2"),
    pid: 'connection:327bfa0b-1984-4c50-8700-9adb1579e6e1',
    from: Long("60"),
    to: Long("60"),
    events: [
      {
        v: 1,
        pid: 'connection:327bfa0b-1984-4c50-8700-9adb1579e6e1',
        sn: Long("60"),
        _t: 'bson',
        p: {
          type: 'persistence-actor-internal:empty-event',
          effect: 'alwaysAlive',
          __hh: {}
        },
        _tg: [ 'always-alive', 'priority-0' ],
        manifest: 'persistence-actor-internal:empty-event',
        _w: '9393e517-419d-4ade-9338-500780c519eb'
      }
    ],
    v: 1,
    _tg: [ 'always-alive', 'priority-0' ]
  }

Since I don't know what this is for, I manually removed all of these records out of the collection and restarted the ditto-connectivity pods. After that, we were able to fetch the ditto connections again using the HTTP GET endpoint. Something to note: the connection count in ditto dropped from 20 to 7. Any idea what might be going on? We experienced this in our development environment now and would like to avoid this in our production environment. It's worthy to mention that we recently upgraded ditto from 3.3.7 to 3.4.3

alstanchev commented 9 months ago

This record looks very strange, like it was exported with mongoExport end then restored in the database which would cause the numeric values too look like the "from:, to:, sn:". If this is what you did you should use mongoDump as it uses binary and preservs types not like mongoExport that works with json.

BobClaerhout commented 8 months ago

I agree it looks strange. I double checked with my colleagues and we did not do any manual actions on the database. The origin of these are therefor unknown. On the other hand: it's not good this makes the "list connections" call fail, is it? Wouldn't it be better to just return the connections it is able to load and skip the ones which fail instead of failing the call?

alstanchev commented 8 months ago

I would prefer to make the error more vissible with a more descriptive message (something like "Corrupted record or unable to deserialize") instead of returning what is possible only. Otherwise such corruptions will remain hidden. But still i doubt this was persisted by the service. Is it possible that by mistake maybe while examining the db with Mongo Compass or the atlass UI an unoticed edit was done?

BobClaerhout commented 7 months ago

I doubt the edit. It's not that easy to change documents in the atlass UI, let alone multiple accidental edits. Providing a more descriptive error is also fine!