FoundationDB / fdb-document-layer

A document data model on FoundationDB, implementing MongoDB® wire protocol
Apache License 2.0
208 stars 29 forks source link

Store numeric field names as string keys, instead of integers #35

Open apkar opened 5 years ago

apkar commented 5 years ago

Document Layer is storing numeric field names as binary integers, this is limiting numeric field names to be an integer. I am guessing this decision is taken to maintain the order of numeric fields. This is not a valid assumption. All JSON field names are strings.

In [74]: db.coll.insert({'_id': 'foo', '12345678901234567890': 'test'})
Out[74]: 'foo'

In [75]: for row in db.coll.find():
    ...:     print row
    ...:
{u'_id': u'foo', u'-1': u'test'}

Also, changing this behavior would make array expansion code much cleaner. As array indexes are integers, it is hard to guess if a kye is pointing to an array element or a numeric field.

Related code is here

void insertElementRecursive(bson::BSONElement const& elem, Reference<IReadWriteContext> cx) {
    std::string fn = elem.fieldName();
    if (std::all_of(fn.begin(), fn.end(), ::isdigit)) {
        const char* c_fn = fn.c_str();
        insertElementRecursive(atoi(c_fn), elem, cx);
    } else {
        insertElementRecursive(fn, elem, cx);
    }
}
zorkian commented 5 years ago

I ran into this today while trying to see if we could use FoundationDB's document layer here at Discord. We have one part of our user record that is a nested dictionary where the keys are longs and currently the document layer is causing the string keys to be converted to ints and the data is therefore invalid.

Any chance this is a reasonably fixable issue? I see @apkar you made some improvements here, but not quite enough to solve it?

apkar commented 5 years ago

Yeah, it's not fixed yet. Fix is to store all field names as strings, including numeric field names. This is actually an easy change. But, this is an on-disk format change. This can be handled in one of two ways, without breaking backward compatibility

The second approach is least intrusive and easy to prove that it's working.

zorkian commented 5 years ago

Thank you for the response!

That makes sense. In the short term, we're working to do something on our end where we rewrite our own keys so they look like strings. It's ugly, for sure.

I think that your approaches make sense, the first one is probably better. Do you think someone relatively unfamiliar with your codebase (like me) could take a crack at it? And if so, do you have any pointers to where in the code these bits would need to be written and/or any similar processes that already exist I could model off of?

zorkian commented 5 years ago

Just figured I'd swing back by in case you had any thoughts on this one. =)