HouzuoGuo / tiedot

A rudimentary implementation of a basic document (NoSQL) database in Go
BSD 2-Clause "Simplified" License
2.72k stars 258 forks source link

struct (un)marshal #46

Open jbenet opened 10 years ago

jbenet commented 10 years ago

Hey, tiedot's great.

It would be good if tiedot supported direct struct marshal/unmarshal. There's no reason it shouldn't, as encoding/json supports it, and that's what tiedot's using.

I see that the reason the interface has map[string]interface{} is to add the id field. Is this necessary? Or, could reflection be used? (structs could include the id field).

HouzuoGuo commented 10 years ago

At the moment, two reasons exist for the document to be internally recognized as map[string]interface{}:

How about we introduce the following two new APIs, for embedded usage:

For Insert and Update, the implementation serializes the user defined structure and manipulate the result string to put in the ID.

For Read, it will simply deserialize the document JSON into the structure.

How does it sound?

jbenet commented 10 years ago

Hey! the interface methods look good to me. :+1:

Well, i think that adding the PK to the object is not a good idea. It dirties up the user's data. For instance, usually one expects data put into a database to be returned exactly the same. More concretely:

checksum1 = sha(val1)
db.put(key, val1)
val2 = db.get(key)
checksum2 = sha(val2)
assert val1 == val2
assert checksum1 == checksum2

Adding the key modifies the document, and breaks modularity. I think either:


also, looking at the source, looks like there isn't a good way to address arrays with paths in GetIn. If I'm understanding correctly, GetIn tries all array items. (e.g. foo,bar returns 1 in {"foo": [{"baz":2}, "b", {"bar":1}]} -- am I right?) Perhaps may be better to explicitly address the arrays with indices, like mongo/bson do (foo,2,bar above).

HouzuoGuo commented 10 years ago

You are right, GetIn tries all array items to find the interested document value. I am aware of the Mongo feature, but rarely see it being used in real applications.

Regarding the doc ID - right now it is called "_pk". We may change the name to "@PK" for example - that is valid for JSON but invalid to be deserialized into Go structure.

How do you type thumb up? Is it a unicode character?

jbenet commented 10 years ago

rarely see it being used in real applications.

So, what happens here: events,date from {"events": [{"date":1}, {"date":2}, {"date":3}]} ? There's no way to explicitly address the individual events, so I couldn't ask to index on "the date of the first event".

that is valid for JSON but invalid to be deserialized into Go structure.

Something that will return {"_pk": 1, "other": "stuff"} and {"other": "stuff"} depending on the API that's used? This seems inconsistent to me.

Let me ask this, why do you want to return the _pk in the first place? is there any place where it gets returned that you didn't previously have it already?

:+1: -> :+1: :)

alexandrestein commented 10 years ago

Hey everyone.

Return id is useful when you show the user multiple elements and he select one of those. (for update or open) The simplest way to do that is to hide the element id and when user click, the app get the related id.

The "_id" (as MongoDB) or "_key" sounds good to me.

I would prefer to have the id wrap in the object by default if the struct define a named field like "_id" for example. As "mgo" (GoLang MongoDB driver) works.

What about that?

HouzuoGuo commented 10 years ago

OK, now index on particular array element sounds useful.

What about changing the ID attribute name from "_pk" into "@id"? Because Go structure cannot have member "@id" attribute, therefore the JSON deserializer will happily skip that one.

Quite a few NoSQL engines put document ID in the document itself, like Alexandre said. In the case of tiedot, it helps a lot with query performance.

HouzuoGuo commented 10 years ago

If you don't mind, I shall change the ID attribute name from "_pk" to "@id" straight away.

HouzuoGuo commented 10 years ago

Completed @id change: 3cc156b119724c1a46fdf8fd042e89472d44425a

HouzuoGuo commented 10 years ago

Sorry about the delay. The three new functions will make it into nextgen branch.

Also in nextgen branch, "@id" will no longer be physically stored in document itself, but rather only maintained in lookup table.

HouzuoGuo commented 10 years ago

these APIs will make their way into version 3.1 =)

stunti commented 10 years ago

Hi. I am new to tiedot but I can't find these functions with that signature. Are they implemented under a different name? Thanks

HouzuoGuo commented 10 years ago

I am terribly sorry, very unfortunately the features did not make their way into the 3.1. For now map[string]interface{} is still the only supported document structure.

Here are some usage examples: https://github.com/HouzuoGuo/tiedot/blob/master/example.go

Note that: document ID is no longer an attribute in the document content itself.

geoah commented 9 years ago

Seems that these have been introduced in 3.2, is this stable to use or not yet ready?

HouzuoGuo commented 9 years ago

@geoah This is a trivial feature and I forgot about it during 3.2, sorry! Actually the interface between 3.0 and 3.2 have not changed. It should be really simple to write, let me see if I can get it done in the next several days.

kylewolfe commented 9 years ago

Hi Houzuo,

I'm interested in using tiedot a little more and have a few questions on this subject. In reference to :

At the moment, two reasons exist for the document to be internally recognized as map[string]interface{}:

GetIn function traverses serialized JSON document structure to find values to be indexed/unindexed To make use of query results, documents have their unique ID chucked in and then written to disk.

The method Col.ForEachDoc (https://godoc.org/github.com/HouzuoGuo/tiedot/db#Col.ForEachDoc) returns a []byte and in turn basically a json document that can quickly be unmarshaled using json.Unmarshal. I've found that using json to unmarshal is far better than something that does a map[string]interface{} -> struct conversion.

My questions: 1) how far along are you with any refactor to the API? 2) is the design final? 3) if not, and this []byte json document is indeed how the data comes off the wire. may I suggest creating a new data type of []byte that has a method to convert return a map[string]interface{}?

I think an API that allows both map[string]interface{} AND a JSON compatible struct in / out would be amazing.

A pain point I currently have is that for me to read from tiedot in to a struct I have to do something like this:

// convert each row back in to a Session type
    for id := range queryResult {
        doc, err := col.Read(id)
        if err != nil {
            Logger.Fatalln(err)
        }
        doc["Id"] = id

        ns := &Session{}
        err = mapToStruct(doc, ns)
        if err != nil {
            Logger.Fatalln(err)
        }
        s = append(s, *ns)
    }

In drivers such as mgo, there is a mechanism for unmarshaling the Id directly in the the given struct (here I had to inject it into my map before unmarshaling). You'll also notice that s and queryResult had to be allocated ahead of time. If a type is returned that has the ability to convert its self to both map[string]interface as well as []byte, I can skip that and go right in to unmarshaling using my preferred package.

HouzuoGuo commented 9 years ago

Thank you very much for the suggestions @kylewolfe and sorry I have not been active with the project development.

For your use case I'd like to propose four new functions:

I hope they'll cover your use cases. What do you suggest to be their names?

If they look good to you I should be able to implement them within a day or two.

kylewolfe commented 9 years ago

Thanks for the quick reply!

Your proposed methods would use json to (un)marshal? What would idAttrName be used for?

Is the idea of a single return type that can do multiple things out of the realm of possibility? Or maybe just too much of a breaking change? Ideally an interface such as this would be better, so that Id is handled by (un)marshaling:

type Foo struct {
    Id int
    Foo, Bar string
}

f := &Foo{Foo: "foo", Bar: "bar"}
if err := col.Insert(f); err != nil {
    panic(err)
}
fmt.Println(f.Id) // has new id

f.Foo = "baz"
if err = col.update(f); err != nil {
    panic(err)
}

mgo behaves this way so that I do not have to take an extra step to populate Id, if it exists. What are your thoughts?

I'd be happy to start digging in and trying to make some updates to allow for this. I have a high need for embedded db usage, so it would probably be beneficial for me to poke around and understand a little more about tiedot at a lower level.

kylewolfe commented 9 years ago

Going back to the new return type. If []byte indeed does come off the wire (as I see it in ForEachDoc) and you also need map[string]interface for indexing purposes, etc. Then a type something like this could be used across the whole application:

type Row []byte {}
func (r Row) ToMap() map[string]interface{} // allows you to do what you need for indexing

Then during ForEachDoc, both json unmarshaling and any existing functionality around map[string]interface{} can be used.

HouzuoGuo commented 9 years ago

All right, I misunderstood your original use case regarding mgo. It will be a little difficult to support mgo style API, but I'll try my best to make the existing APIs more pleasant.

json.Marshal/Unmarshal uses JSON byte array instead of string, which happens to be compatible with the representation of JSON documents in the storage. However the higher level API in Col such as Col.Read() returns map[string]interface{} which is inconvenient.

So starting from Col.Read I propose two more functions (names are not final)

Read2(id int, destination interface{}) reads document at specified ID and deserialise it to destination, structure of your choice.

Read3(id int, idAttrName string, destination interface{}) is similar to above, but it also artificially introduces (or modifies) an ID attribute denoted by the idAttrName, so that deserialised structure will have the document ID as well.

And finally, to make Insert and Update easier to use, they will also take interface{} instead of map[string]interface{} as the document input. These two functions will convert the input into index-capable map[string]interface{} automatically, for maintaining indexes.

So in the end the new APIs behave similar to mgo API in terms of supporting any structure as document and treatment of document ID in the deserialised structure.

How does that sound?

kylewolfe commented 9 years ago

I think I may have a crack at this tonight. Rather than what you describe in Read3, how about utilizing a structs tag feature within reflect?

type Foo struct {
    Id int `tiedot: id`
    Foo, Bar string
}

I'm assuming your looking to keep API compatibility for awhile longer (before v4 if at all?)

HouzuoGuo commented 9 years ago

it will definitely be ideal if tiedot can make use of the tag, although I do not know how. Please feel free to experiment with the feature and I look forward to hear from your findings.

kylewolfe commented 9 years ago

Here's a little show and tell of what I have so far (not ready for a PR): https://github.com/HouzuoGuo/tiedot/compare/master...kylewolfe:api_break

I titled the branch api_break, thinking that I would write up what an ideal final API would look like, regardless of breaking changes, but then I realized you could actually switch to a format in which mgo uses (doc interface{}) and still keep backward compatibility.

Insert now supports map[string]interface, the new type of Document, []byte and any pointer to a struct. If a pointer to a struct is given, it will search for a tiedot tag of Id, and update that struct directly, allowing for my earlier example.

Once tested (and other functions converted), this should allow for:

type Foo struct {
    Id int `tiedot: id`
    Foo string
}

f := &Foo{Foo: "bar"}
if _, err := col.Insert(f); err != nil {
    panic(err)
}
// f.Id now has the new id

f.Foo = "baz"
if _, err := col.Update(f.Id,  f); err != nil { // Ideally, the first argument wouldn’t be needed, but oh well
    panic(err)
}

What are your thoughts on this direction?

HouzuoGuo commented 9 years ago

That looks even better than I previously thought! well done! I added you into collaborator list so feel free to push changes.

kylewolfe commented 9 years ago

Excellent! I'll move my work in to a branch on the official repo and continue to play. Do you have a standard or thought on breaking changes? I realized that some more thought needs to go in to a few other methods (ForEachDoc, for example, I believe would have to receive a breaking change in order to support struct unmarshaling)

HouzuoGuo commented 9 years ago

Well I think people are quite familiar with the idea that softwares evolve and breaking changes will occur . As long as the actual feature isn't taken away I wouldn't mind changing some function signatures.

arafath-mk commented 8 years ago

Hi.. What is the status of this issue..?

HouzuoGuo commented 7 years ago

@arafath-mk sorry, little progress has been made regarding this issue, although it should not be a difficult matter. @kylewolfe hello! do you by any chance have a moment to take a look at this?

kylewolfe commented 7 years ago

Unfortunately, I opted to use a different solution for my project. As a result I did not get around to finishing my work on this issue. My partial work is still available though (https://github.com/HouzuoGuo/tiedot/compare/master...kylewolfe:api_break) if someone wants to pick up the baton.

phpfs commented 3 years ago

Hey there - any news on adding a function to directly insert a struct? :)

HouzuoGuo commented 3 years ago

thanks for your interest in this project @phpfs , this project started as a short programming exercise, i'm very sorry that i didn't have a chance to take it higher :|