arangodb / arangojs

The official ArangoDB JavaScript driver.
https://arangodb.github.io/arangojs
Apache License 2.0
601 stars 107 forks source link

ArangoError#toJSON should not break from circular references #742

Closed robross0606 closed 3 years ago

robross0606 commented 3 years ago

ArangoJS uses x3-linkedlist for internal structures and appears to be returning those on things like errors and in other spots. The problem is that LinkedListItem has circular references in it which is indeterminately breaking other things.

For example, this appears to be returned in an ArangoError which then breaks trying to use those errors on logging frameworks when they attempt to call JSON.stringify(). It is also breaking jaeger-client for opentracing when it tries to do the same on span data such as an AQL query.

Converting circular structure to JSON
7/30/2021 7:54:53 PM  --> starting at object with constructor 'LinkedListItem'
7/30/2021 7:54:53 PM  | property 'before' -> object with constructor 'LinkedListItem'
7/30/2021 7:54:53 PM  "Function.getThriftTags (/usr/src/app/node_modules/jaeger-client/dist/src/thrift.js:84:23)",
7/30/2021 7:54:53 PM  "Function.getThriftLogs (/usr/src/app/node_modules/jaeger-client/dist/src/thrift.js:115:31)",
7/30/2021 7:54:53 PM  "Function.spanToThrift (/usr/src/app/node_modules/jaeger-client/dist/src/thrift.js:170:30)",
7/30/2021 7:54:53 PM  "RemoteReporter.report (/usr/src/app/node_modules/jaeger-client/dist/src/reporters/remote_reporter.js:85:41)",
7/30/2021 7:54:53 PM  "Tracer._report (/usr/src/app/node_modules/jaeger-client/dist/src/tracer.js:217:22)",
7/30/2021 7:54:53 PM  "Span.finish (/usr/src/app/node_modules/jaeger-client/dist/src/span.js:214:22)",

Directly exposing internal exotic structures with circular references to external callers is a code smell.

pluma commented 3 years ago

After digging through this, I'm pretty sure ArangoError is not at fault as it implements toJSON, which JSON.stringify respects: #632.

LinkedList is only used in cursors. There is no reason you should expect to be able to JSON.stringify random values and it's not LinkedList's fault that this breaks. Circular structures are normal and expected in JavaScript. If you want to convert arbitrary values to JSON for logging purposes, you should use a library that can represent circular structures, not the built-in function. Not every JS value is serializable or can be meaningfully represented in JSON.

I'm closing this as invalid.

robross0606 commented 3 years ago

So you closed this because you’re “pretty sure” without any chance for response? Nice. I never said circular structures are exotic. I said leaking LinkedList is exotic and you’re still doing it. I’ll get more specific proof and open a new ticket. And this wasn’t fixed until 7.0.0 according to your link. This ticket isn’t invalid, but it may be a duplicate.

pluma commented 3 years ago

You reported this in July and indicated the problem occurs with ArangoError. ArangoJS 7.0.0 was released in August 2020. If the problem indeed occurs because of ArangoError, my recommendation would be to upgrade to the latest version and see if that solves your problem. I'm also adding a toJSON method to forwarded system errors in 8.0 to prevent allow JSONification of those errors too.

We're using LinkedList for the cursor because it drastically improved performance compared to manipulating a regular array when operating over large result sets. You shouldn't be seeing any problems with that unless you attempt to convert a cursor object to JSON, which isn't something we explicitly support because it's probably not doing what you would want to do in most use cases I can think of.

I realize the tone of my reply might have come across as a bit aggressive especially given the late response. I'd like to apologize for that. English is not my native language and there is a fine line between succinct and abrasive.

If you can provide me with an example I can replicate with the current release, please feel free to reopen this issue at any time.

pluma commented 3 years ago

To be clear about what I mean by "you shouldn't convert cursors to JSON": a cursor represents a result set that may be incomplete and may need to be incrementally loaded. If you want to convert the result of a query to JSON, you should take the data out of the cursor first or build the response by consuming the cursor.

Cursors can represent very large result sets that may exceed what can meaningfully be converted to JSON in a node process, which is why we explicitly support batching. Additionally dangling cursors will clutter up the database and allowing converting them to JSON would make it more likely users forget to either consume or destroy the cursor.

In other words: by forcing users to consume cursors explicitly we try to prevent them from shooting themselves in the foot, either by writing unbounded queries that can generate excessive result sets or by leaving cursors dangling after only consuming the first result batch. If you want to stringify cursors for diagnostic purposes, you shouldn't rely on JSON.stringify and instead use a purpose-built-library that can handle recursive nesting.

Also LinkedList is not the only case of recursive nesting in ArangoJS. The serialization problem with ArangoError prior to 7.0.0 was actually caused by the native Node.js request object. This is why I said this is not a problem with LinkedList and not something we want to avoid in general: we can not make native Node.js objects non-recursive and we do not make any guarantees about hiding them on objects not explicitly intended to be serialized.

Again, apologies if my initial response sounded overly rude.

robross0606 commented 3 years ago

To be clear back, we're not attempting to convert Arango cursors to JSON. This is cropping up related to OpenTracing on errors thrown by Arango.

pluma commented 3 years ago

Interesting! I'd love to see how a linked list ends up in that. We also use linked lists in the connection object to handle task queues (again, for performance reasons), so maybe it's trying to log that or the Database object it is attached to as JSON.

robross0606 commented 2 years ago

I've finally hit a situation where I can consistently reproduce this problem. It appears to me like opentracing and/or Jaeger are automatically instrumenting ArangoJS REST calls and then failing to decode the circular JSON references. I was able to capture the span content itself to a file using safeStringify().

span-problems.log

There are events in the opentracing context that we don't actively set up ourselves (e.g. "start_import_documents") and it looks like the entire body of every REST call via ArangoJS is captured in the trace. You can also see the "behind": "[Circular]" in here indicating where the LinkedList has leaked onto the trace and causes a JSON decode failure.

Do you know if OpenTracing or Jaeger automatically instruments ArangoJS rest calls?

robross0606 commented 2 years ago

I found the root cause here and it was internal. I don't believe there's an issue with the library at all. This can remain closed.