Open treeleaf opened 7 years ago
Hi,
The serialization is made by the mongo java driver. I will look at the latest version.
Same problem here with the following mapping:
<field fieldName="id" type="bin_uuid" id="true" strategy="NONE" />
Document:
use Ramsey\Uuid\Uuid;
$this->id = Uuid::uuid4();
Having the same problem with MongoChef: bytes.length != 16.
A general driver / BSON problem?
I checked our collections with this plugin and MongoChef on our production servers too - same problem.
It seems to platform / driver independent. The problem is the BinData for UUID
s. This is the way the look in the shell:
"_id" : BinData(3,"ZjU1NDYwMDYtZGRhYy0xMWU2LThmOTctOWNlYmU4MmE5NmFh")
I discussed the issue with MongoChef, @jmikola and @ramsey some time ago:
But I don't think we found a way to solve it. Could be a #PHP7 issue which is running on all the servers I tested.
MongoChef wrote:
But yes it appears as though your doc has a binary field of sub-type 0x03 (LUUID) but doesn't actually contain a 16-byte UUID
But our UUID used to work before we updated to PHP7 which is now using a new mongodb extension instead of the old mongo.
After base64-decoding the binary data in the snippets above, it looks like you guys are storing hexadecimal strings instead of the binary representation of the UUID:
Base64 value | Decoded value |
---|---|
M2RkMzI1YmMtNjFhMC0xMWU2LTk0MWYtODRmY2EwMzc2ZTJm |
f5546006-ddac-11e6-8f97-9cebe82a96aa |
ZjU1NDYwMDYtZGRhYy0xMWU2LThmOTctOWNlYmU4MmE5NmFh |
3dd325bc-61a0-11e6-941f-84fca0376e2f |
Instead, you should be storing a 16-byte binary string as the data payload.
There does appear to be a regression in the new PHP driver (ext-mongodb
). The legacy driver (ext-mongo
) would throw an exception if the binary subtype was 0x04 (i.e. MongoBinData::UUID_RFC4122
) and the data was not exactly 16 bytes in length (see: here). In the new driver, which uses libbson, the length is no longer checked. Historically, neither driver validated the data payload for subtype 0x03 (i.e. deprecated UUID type).
This is really only relevant to @treeleaf's example, as he's using type 0x04 in for his binary data, which is documented as requiring the data to conform to RFC 4122. Type 0x03 was deprecated, as drivers were inconsistent in how they mapped that type to native UUID types in their respective language (some would do endian conversion and the like). Type 0x04 was introduced to ensure that drivers utilize the data as-is. See DOCS-1543 and related issues for more context.
In the meantime, I've created PHPC-895 to ensure that we start validating the data length for subtype 0x04. This is definitely something we can address in the PHP driver, and it's possible libbson may be open to a patch as well.
As for subtype 0x03, my advice there is that data viewers such as MongoChef should probably not assert its length. Since it is handled inconsistently by different drivers and the application has no way of knowing if the data originated from a particular language, it's possible they could render the UUID incorrectly by assuming its bytes conformed to RFC 4122. Furthermore, many drivers never validated the length of data stored with subtype 0x03.
Thanks @jmikola for the update!
What does this mean for my UUID generation?
use Ramsey\Uuid\Uuid;
$this->id = Uuid::uuid4();
Any changes I have to make here or do I have to check some drivers?
What does this mean for my UUID generation?
Assuming your current code path is storing the value returned by Uuid::__toString()
, you should likely change this to store the result of Uuid::getBytes()
. I would just confirm that the 16 bytes returned have the appropriate endianness, which may be a question for @ramsey.
I ran tests for endianness for a while, using different Docker containers to emulate big endian environments. All the tests passed in those environments for the same values, so I'm certain a UUID generated in a big endian environment is the same UUID that would be generated in a little endian environment, given the same inputs.
Our Docker build system had some other problems, and I had to discontinue using them on Travis CI, but I need to get back to that approach so that I can show builds passing in big/little endian and 32-bit/64-bit environments.
Can I help you w/ any configuration details on our PHP7 servers @ramsey ?
The problem we are currently facing is that we have no idea how to update the IDs inside the existing documents to make them viewable again. Can anyone recommend some kind of query?
@webdevilopers: Since you currently storing a hexadecimal string in the binary's data field, this should be a matter of constructing a Ramsey\Uuid
object from that data and replacing the binary field with a new MongoDB\BSON\Binary
object constructed from Uuid::getBytes()
and MongoDB\BSON\Binary::TYPE_UUID
.
If you write a migration script to do this, it should iterate on all documents in the collection and ignore documents where the binary's data length is not what you expect (e.g. 3dd325bc-61a0-11e6-941f-84fca0376e2f
is 36 bytes). For each document that should be migrated, you can issue an update to $set
the correct binary value. Since you're going to be updating documents in the collection while iterating, you may also want to use the snapshot
option for MongoDB\Driver\Query
(or MongoDB\Collection::find()
if you're using the library).
@webdevilopers Sure. What kind of processor are you running?
Sorry for the late response @ramsey . I could now successfully use the suggestion by @jmikola as workaround. My value objects hold an id
which holds the UUID
generated with Uuid::uuid4()
.
Before storing I convert them via $employeeId->id()->getBytes()
.
I'm running MongoDB shell version: 2.6.11
on Ubuntu 16.04
with PHP 7.0.13-0ubuntu0.16.10.1
and the following Composer for Symfony:
"doctrine/mongodb-odm": "^1.0",
"doctrine/mongodb-odm-bundle": "^3.0",
"alcaeus/mongo-php-adapter": "^1.0",
"ext-mongo": "*",
Anything more I could tell you?
BTW: can your UUID type also be registered for mongodb
or dbal
only?
https://github.com/ramsey/uuid-doctrine
Just took a look at my collection via RoboMongo and recognized the UUID
is shown as LUUID
:
Never recognized the Legacy UUID before.
@jmikola said:
I would just confirm that the 16 bytes returned have the appropriate endianness, which may be a question for @ramsey.
@webdevilopers, I was asking what kind of processor you're using (i.e., x86, PowerPC, s390, etc.). Based on what you've told me, I'll assume it's an x86 processor, which is little-endian, and I can confirm that the binary UUID representation in ramsey/uuid is correct for little-endian architectures (it should also be correct for big-endian).
BTW: can your UUID type also be registered for mongodb or dbal only?
To my knowledge, no one has tried to use it with the ODM yet. I don't use MongoDB, so I can't answer this question (sorry!). I know that ramsey/uuid-doctrine has a hard dependency on Doctrine\DBAL\Types\Type
. I'm not sure whether that makes a difference.
Never recognized the Legacy UUID before.
I'm not sure what an LUUID
is. Is this a MongoDB thing?
Just found this @ramsey by @nguyenquyhy:
The legacy UUID comes from the C# driver:
Can anyone tell me where to exactely change the configuration of the driver to use the Standard UUID?
For @robomongo users:
Thanks guys for the help! I will try to sum it up for me. Please correct if I'm wrong.
Our servers are using php-pecl-mongodb
1.2.5 and libbson
1.3.5.
Mongo Shell tells me our IDs are stored with type 3
.
But we are using the Uuid::uuid4()
method and add Doctrine mapping bin_uuid
for the "_id" column.
The problem is that the library does not throw an error for those cases.
What we can do is switch back to uuid3
for new documents. Or use Uuid::getBytes()
.
But currently there is nothing we can do to make the PHP driver use the type 4
by default.
Because then we could simply update existing IDs and then everything should be fine with `Uuid::uuid4()
.
BTW: can your UUID type also be registered for mongodb or dbal only?
Based on Ramsey\Uuid\Doctrine\UuidType, the library looks specific to Doctrine ORM. I'll also note that its convertToDatabaseValue()
method returns a string. If this was intended to be used with ODM, I'd expect it to return a binary BSON type with 16 bytes.
But currently there is nothing we can do to make the PHP driver use the type 4 by default.
To be clear, the PHP driver uses no type by default. MongoDB\BSON\Binary::__construct()
requires you to specify a $type
argument.
I think you might also be confusing the UUID versions, which correlate with @ramsey's factory methods (e.g. uuid3()
, uuid4()
, uuid5()
), with the BSON binary types. These are not comparable. Those factory methods in @ramsey's library merely generate an RFC 4122 UUID using different strategies, as outlined in the RFC.
The BSON binary type 4 is intended to store the binary representation of an RFC 4122 UUID (i.e. 16 bytes in little endian). BSON binary type 3 is only deprecated because it was never documented as a well-formed RFC 4122 UUID and older drivers for languages with native UUID types happened to store their 16-byte values in different byte orders. That said, BSON binary type 3 and 4 should never have been used by any user or driver to store anything other than a 16-byte value.
@webdevilopers: Does that clear things up?
I use PHPStorm as IDE and in the PHP application. I have a collection with a bin uuid (BSON); for example BinData(4,"M2RkMzI1YmMtNjFhMC0xMWU2LTk0MWYtODRmY2EwMzc2ZTJm"). The plugin is unable to show the collection and throws an exception org.bson.BsonSerializationException: Expected length to be 16, not 36