Closed pva closed 2 years ago
Hi, I am glad that you are now able to run the tool :slightly_smiling_face:. Let's take a look at what we have here.
The error is very strange because strings in all documents stored in a MongoDB database are supposed to be UTF-8 encoded. Upon further investigation, I have stumbled upon this open MongoDB issue, which could explain the error as it seems that MongoDB can produce invalid strings. However, in any case, this seems like a MongoDB issue. I know, this is not very helpful, but I wanted to understand the root cause first.
Also, I tried to reproduce the issue by trying to insert invalid strings into a testing database, but failed to do so.
Maybe it's a good idea to print MongoDB document that causes this problem?
The problem is that there is no access to that document from the code. Indeed, when trying to retrieve the document, MongoDB fails with that invalid utf-8 sequence
error. I would have to write a custom document de-serializer that would allow me to inspect the raw bytes of the document, and potentially replace incorrect characters with ?
(or something similar).
In commit b40f9f8a972cac4d74609afb4449836c8a2ee07c, I have tried modifying the code to skip such invalid documents. Could you please try running the tool from the code that is in the current master
branch? As I have written above, I was unable to reproduce the issue in any way, so I am not sure if such a skip will actually work. Please, let me know whether you are able to retrieve the stats now, and we will go from there :slightly_smiling_face:.
I've tried current master and the issue is solved. Some documents disturb output, but in the end I'm getting nice table with statistics. Since there are lots of documents loss of few documents makes no harm for overall statistics! Thank you very much for you help.
P.S. that issue with MongoDB is really crazy: I really don't understand why there is now postvalidation or something on MongoDB side... It looks like MongoDB violates specification and they fix drivers to cope with that instead of looking at the root cause... Very strange.
Finally, I'm able to run this really cool tool. Thank you very much for your help.
Now I'm experiencing the following problem:
Also, I saw the following problem:
Is it possible to understand what causes this problem based on these messages? Maybe it's a good idea to print MongoDB document that causes this problem?