dboissier / mongo4idea

MongoDB integration in Intellij
Apache License 2.0
975 stars 109 forks source link

Unable to view collection with fields containing type BinData uuid (rfc4122) type 4 #162

Open treeleaf opened 7 years ago

treeleaf commented 7 years ago

I use PHPStorm as IDE and in the PHP application. I have a collection with a bin uuid (BSON); for example BinData(4,"M2RkMzI1YmMtNjFhMC0xMWU2LTk0MWYtODRmY2EwMzc2ZTJm"). The plugin is unable to show the collection and throws an exception org.bson.BsonSerializationException: Expected length to be 16, not 36

dboissier commented 7 years ago

Hi,

The serialization is made by the mongo java driver. I will look at the latest version.

webdevilopers commented 7 years ago

Same problem here with the following mapping:

<field fieldName="id" type="bin_uuid" id="true" strategy="NONE" />

Document:

use Ramsey\Uuid\Uuid;

$this->id = Uuid::uuid4();
webdevilopers commented 7 years ago

Having the same problem with MongoChef: bytes.length != 16.

A general driver / BSON problem?

webdevilopers commented 7 years ago

I checked our collections with this plugin and MongoChef on our production servers too - same problem. It seems to platform / driver independent. The problem is the BinData for UUIDs. This is the way the look in the shell:

"_id" : BinData(3,"ZjU1NDYwMDYtZGRhYy0xMWU2LThmOTctOWNlYmU4MmE5NmFh")

I discussed the issue with MongoChef, @jmikola and @ramsey some time ago:

But I don't think we found a way to solve it. Could be a #PHP7 issue which is running on all the servers I tested.

MongoChef wrote:

But yes it appears as though your doc has a binary field of sub-type 0x03 (LUUID) but doesn't actually contain a 16-byte UUID

But our UUID used to work before we updated to PHP7 which is now using a new mongodb extension instead of the old mongo.

jmikola commented 7 years ago

After base64-decoding the binary data in the snippets above, it looks like you guys are storing hexadecimal strings instead of the binary representation of the UUID:

Base64 value Decoded value
M2RkMzI1YmMtNjFhMC0xMWU2LTk0MWYtODRmY2EwMzc2ZTJm f5546006-ddac-11e6-8f97-9cebe82a96aa
ZjU1NDYwMDYtZGRhYy0xMWU2LThmOTctOWNlYmU4MmE5NmFh 3dd325bc-61a0-11e6-941f-84fca0376e2f

Instead, you should be storing a 16-byte binary string as the data payload.

There does appear to be a regression in the new PHP driver (ext-mongodb). The legacy driver (ext-mongo) would throw an exception if the binary subtype was 0x04 (i.e. MongoBinData::UUID_RFC4122) and the data was not exactly 16 bytes in length (see: here). In the new driver, which uses libbson, the length is no longer checked. Historically, neither driver validated the data payload for subtype 0x03 (i.e. deprecated UUID type).

This is really only relevant to @treeleaf's example, as he's using type 0x04 in for his binary data, which is documented as requiring the data to conform to RFC 4122. Type 0x03 was deprecated, as drivers were inconsistent in how they mapped that type to native UUID types in their respective language (some would do endian conversion and the like). Type 0x04 was introduced to ensure that drivers utilize the data as-is. See DOCS-1543 and related issues for more context.

In the meantime, I've created PHPC-895 to ensure that we start validating the data length for subtype 0x04. This is definitely something we can address in the PHP driver, and it's possible libbson may be open to a patch as well.

As for subtype 0x03, my advice there is that data viewers such as MongoChef should probably not assert its length. Since it is handled inconsistently by different drivers and the application has no way of knowing if the data originated from a particular language, it's possible they could render the UUID incorrectly by assuming its bytes conformed to RFC 4122. Furthermore, many drivers never validated the length of data stored with subtype 0x03.

webdevilopers commented 7 years ago

Thanks @jmikola for the update!

What does this mean for my UUID generation?

use Ramsey\Uuid\Uuid;

$this->id = Uuid::uuid4();

Any changes I have to make here or do I have to check some drivers?

jmikola commented 7 years ago

What does this mean for my UUID generation?

Assuming your current code path is storing the value returned by Uuid::__toString(), you should likely change this to store the result of Uuid::getBytes(). I would just confirm that the 16 bytes returned have the appropriate endianness, which may be a question for @ramsey.

ramsey commented 7 years ago

I ran tests for endianness for a while, using different Docker containers to emulate big endian environments. All the tests passed in those environments for the same values, so I'm certain a UUID generated in a big endian environment is the same UUID that would be generated in a little endian environment, given the same inputs.

Our Docker build system had some other problems, and I had to discontinue using them on Travis CI, but I need to get back to that approach so that I can show builds passing in big/little endian and 32-bit/64-bit environments.

webdevilopers commented 7 years ago

Can I help you w/ any configuration details on our PHP7 servers @ramsey ?

The problem we are currently facing is that we have no idea how to update the IDs inside the existing documents to make them viewable again. Can anyone recommend some kind of query?

jmikola commented 7 years ago

@webdevilopers: Since you currently storing a hexadecimal string in the binary's data field, this should be a matter of constructing a Ramsey\Uuid object from that data and replacing the binary field with a new MongoDB\BSON\Binary object constructed from Uuid::getBytes() and MongoDB\BSON\Binary::TYPE_UUID.

If you write a migration script to do this, it should iterate on all documents in the collection and ignore documents where the binary's data length is not what you expect (e.g. 3dd325bc-61a0-11e6-941f-84fca0376e2f is 36 bytes). For each document that should be migrated, you can issue an update to $set the correct binary value. Since you're going to be updating documents in the collection while iterating, you may also want to use the snapshot option for MongoDB\Driver\Query (or MongoDB\Collection::find() if you're using the library).

ramsey commented 7 years ago

@webdevilopers Sure. What kind of processor are you running?

webdevilopers commented 7 years ago

Sorry for the late response @ramsey . I could now successfully use the suggestion by @jmikola as workaround. My value objects hold an id which holds the UUID generated with Uuid::uuid4(). Before storing I convert them via $employeeId->id()->getBytes().

I'm running MongoDB shell version: 2.6.11 on Ubuntu 16.04 with PHP 7.0.13-0ubuntu0.16.10.1 and the following Composer for Symfony:

        "doctrine/mongodb-odm": "^1.0",
        "doctrine/mongodb-odm-bundle": "^3.0",
        "alcaeus/mongo-php-adapter": "^1.0",
        "ext-mongo": "*",

Anything more I could tell you?

BTW: can your UUID type also be registered for mongodb or dbal only? https://github.com/ramsey/uuid-doctrine

webdevilopers commented 7 years ago

Just took a look at my collection via RoboMongo and recognized the UUID is shown as LUUID: luuid

Never recognized the Legacy UUID before.

ramsey commented 7 years ago

@jmikola said:

I would just confirm that the 16 bytes returned have the appropriate endianness, which may be a question for @ramsey.

@webdevilopers, I was asking what kind of processor you're using (i.e., x86, PowerPC, s390, etc.). Based on what you've told me, I'll assume it's an x86 processor, which is little-endian, and I can confirm that the binary UUID representation in ramsey/uuid is correct for little-endian architectures (it should also be correct for big-endian).

BTW: can your UUID type also be registered for mongodb or dbal only?

To my knowledge, no one has tried to use it with the ODM yet. I don't use MongoDB, so I can't answer this question (sorry!). I know that ramsey/uuid-doctrine has a hard dependency on Doctrine\DBAL\Types\Type. I'm not sure whether that makes a difference.

Never recognized the Legacy UUID before.

I'm not sure what an LUUID is. Is this a MongoDB thing?

webdevilopers commented 7 years ago

Just found this @ramsey by @nguyenquyhy:

The legacy UUID comes from the C# driver:

Can anyone tell me where to exactely change the configuration of the driver to use the Standard UUID?

For @robomongo users:

webdevilopers commented 7 years ago

Thanks guys for the help! I will try to sum it up for me. Please correct if I'm wrong.

Our servers are using php-pecl-mongodb 1.2.5 and libbson 1.3.5. Mongo Shell tells me our IDs are stored with type 3. But we are using the Uuid::uuid4() method and add Doctrine mapping bin_uuid for the "_id" column. The problem is that the library does not throw an error for those cases.

What we can do is switch back to uuid3 for new documents. Or use Uuid::getBytes().

But currently there is nothing we can do to make the PHP driver use the type 4 by default. Because then we could simply update existing IDs and then everything should be fine with `Uuid::uuid4().

jmikola commented 7 years ago

BTW: can your UUID type also be registered for mongodb or dbal only?

Based on Ramsey\Uuid\Doctrine\UuidType, the library looks specific to Doctrine ORM. I'll also note that its convertToDatabaseValue() method returns a string. If this was intended to be used with ODM, I'd expect it to return a binary BSON type with 16 bytes.

But currently there is nothing we can do to make the PHP driver use the type 4 by default.

To be clear, the PHP driver uses no type by default. MongoDB\BSON\Binary::__construct() requires you to specify a $type argument.

I think you might also be confusing the UUID versions, which correlate with @ramsey's factory methods (e.g. uuid3(), uuid4(), uuid5()), with the BSON binary types. These are not comparable. Those factory methods in @ramsey's library merely generate an RFC 4122 UUID using different strategies, as outlined in the RFC.

The BSON binary type 4 is intended to store the binary representation of an RFC 4122 UUID (i.e. 16 bytes in little endian). BSON binary type 3 is only deprecated because it was never documented as a well-formed RFC 4122 UUID and older drivers for languages with native UUID types happened to store their 16-byte values in different byte orders. That said, BSON binary type 3 and 4 should never have been used by any user or driver to store anything other than a 16-byte value.

@webdevilopers: Does that clear things up?