Codecademy / EventHub

An open source event analytics platform
http://tinyurl.com/eventhub
1.33k stars 139 forks source link

user_property vs property #13

Open bonswouar opened 10 years ago

bonswouar commented 10 years ago

Seeing the documentation, it looks like there are two types of proprieties. Apparently some should be related to an User and some to an Event. But I don't see how to use those (with the Dashboard for example), and I'm not sure it's actually working. Here is how I do :

And when I execute curl http://localhost:8080/users/keys it always returns an empty array.

Did I miss something in the documentation ?


EDIT : Oh, when I use curl to update user information as in the documentation, it appareard in /users/keys (though I still don't know how to use them in the Funnel/Cohort dashboard).

And I can see it doesn't send those user_proprieties in the network call. Here is my JS :

        var clientId = '1234';
        var name = "EventHub";
        var options = {
          url: 'http://localhost:8080',
          flushInterval: 1
        };
        var eventHub = window.newEventHub(name, options);
        eventHub.identify(clientId, {'ip2':'12.0.0.1'});
        eventHub.track('souscription', null);
        eventHub.register({
          'ip': '127.0.0.1'
                  });

And here is the call it generates : http://localhost:8080/events/batch_track?callback=DevTips.jsonp._callback0&events=%5B%7B%22ip2%22%3A%2212.0.0.1%22%2C%22event_type%22%3A%22souscription%22%2C%22external_user_id%22%3A%221234%22%7D%5D

chengtao commented 10 years ago

Sorry about the confusion. When you use eventHub.identify or eventHub.register, those user properties will be stored in browser's local storage and those user properties will get merged into event properties as the events are tracked. That's why when you try to get the user properties from the backend, it doesn't show anything.

On the other hand, when you update the user information through the curl command. It directly updates the user from the backend and those updates will be reflected in the /users/keys

bonswouar commented 10 years ago

Okay, thanks for the explaination !

Though, while eventHub.identify works as you said, apparently I have a problem (or still a misunderstanding?) with eventHub.register. I changed the order (register before track, so that the register parameters will be merged into the track properties :

        var clientId = '1234';
        var name = "EventHub";
        var options = {
          url: 'http://localhost:8080',
          flushInterval: 1
        };
        var eventHub = window.newEventHub(name, options);
        eventHub.identify(clientId, {'ip2':'12.0.0.1'});
        eventHub.register({
          'ip': '127.0.0.1'
                  });
        eventHub.track('souscription', {'ip3' : "192.168.0.0"});

Here is the (decoded) call generated :

http://localhost:8080/events/batch_track?callback=DevTips.jsonp._callback0&events=[{"ip2":"12.0.0.1","ip3":"192.168.0.0","event_type":"souscription","external_user_id":"1234"}]

(the register parameters haven't been merge, unlike identify parameters)

chengtao commented 10 years ago

this scenario is kinda tricky as identify and register were not meant to be used together. The properties specified via eventHub.register will update the user properties of a system generated user while the properties specified via eventHub.identify will update the user properties of the specified user. When eventHub.track is called, the system will prefer using the identified user to the system generated user

bonswouar commented 10 years ago

Okay I got it now, thanks for your answers !

bonswouar commented 10 years ago

Though, maybe it would be useful to add it to the documentation (that they're not compatible together), or to change this (for example merging both anyway ?).

Actually I don't really understand why this complexity : wouldn't it be easier to use the same function for setting properties, no matter if it's a generated user or a specified one ?

As for alias/identify, why not use only identify (for example) and if there is already a generated user just automatically "alias" it ?

Those are just suggestions, and maybe I don't understand all the complexity of this JS library, but anyway thank you for this great job !

EDIT : Also, I was wondering, can you call alias on a not generated user ? I mean, can you alias a user more than once ? And is calling identify on a already identified user has the same effect that alias ?

EDIT bis : Okay I did some tests, and apparently alias works only on the generated user. Meaning I can't use my own ID generator with identify (to store this ID in the cookies for a longer time than a normal session) and then alias it (when the user signup for example). Is there something I missed ?

EDIT bis bis : I tried to manually call alias with the wanted ID (something like : http://localhost:8080/users/alias?callback=DevTips.jsonp._callback1&from_external_user_id=usersignup@test.fr&to_external_user_id=customPreviouslyGeneratedId) , that works.. But I discovered another thing I don't understand : apparently you can call alias only once per "alias", meaning you can't link more than one session to a specific user id ?! Again, is there something I missed ?

bonswouar commented 10 years ago

Just to explain the situation : I'd like to be able to track users that don't "identify" (anonymous visitors) some times and map ("alias") all the precedent sessions when they suscribe (for example).

chengtao commented 10 years ago

The design of alias and identify is similar to mixpanel so that people who are familiar with mixpanel can quickly get started. Calling identify on identified user will simply override the user_id at the client side which is different from calling alias which tells the backend server that the from_external_user_id should be mapped to to_external_user_id.

For alias, the backend supports what you describe as it allows you to specify both from_external_user_id and to_external_user_id while the javascript client side library only allow you to set the from_external_user_id.

Also, can you elaborate more about your situation? Are you using your own id generator?

bonswouar commented 10 years ago

Yes I'm using my own ID Generator (which I store in a persistent cookie).

So, if I understand right, I should always use identify with my generated ID, right ?

And then I would need to modify alias from the JS library to be able to use my own ID instead of the default generated one.

But even then, this scenario doesn't work (for testing I alias manually with curl) :

Until here that's fine, I've got my event in the timeline of the "alias" (his email - ee@ee.com in the example). But then :

=> Here when I check the timeline of the email (ee@ee.com), the previous event(s) has disapeared, it only keeps the ones from the last Session.

I also tried to alias the 2 custom generated IDs together, but it has no effect.

chengtao commented 10 years ago

Ya, alias simply points one id to another and it doesn't merge events...in your scenario, your second alias just point ee@ee.com to the newly created user and that's why the previous events seem to disappear while those previous events were stored under the previous user...In your scenarios, you should do one alias when user sign up (not sign in) and at the top of each page load, identify the user either via the email (if signed in) or your persisted generated id (if not signed in) from the cookie. Will that solve the problem?

On Friday, October 24, 2014, bonswouar notifications@github.com wrote:

Yes I'm using my own ID Generator (which I store in a persistent cookie).

So, if I understand right, I should always use identify with my generated ID, right ?

And then I would need to modify alias from the JS library to be able to use my own ID instead of the default generated one.

But even then, this scenario doesn't work (for testing I alias manually with curl) :

Until here that's fine, I've got my event in the timeline of the "alias" (his email - ee@ee.com javascript:_e(%7B%7D,'cvml','ee@ee.com'); in the example). But then :

=> Here when I check the timeline of the email (ee@ee.com javascript:_e(%7B%7D,'cvml','ee@ee.com');), the previous event(s) has disapeared, it only keeps the ones from the last Session.

I also tried to alias the 2 custom generated IDs together, but it has no effect.

— Reply to this email directly or view it on GitHub https://github.com/Codecademy/EventHub/issues/13#issuecomment-60361730.

朱政道 Cheng-Tao Chu

bonswouar commented 10 years ago

Well, no it doesn't really solve the problem.. As expected, alias when the user sign up works. But then, if the user comes back with a new device (for example), when I identify the user it doesn't "identify" the previous events (before the user logged in), meaning for an anonymous user all events before he's logged in won't be linked to the actual user timeline. Is there any way to get around that problem (I could participate to the project if this evolution is possible) ?

chengtao commented 10 years ago

There are definitely ways to solve the problem and the easiest way is probably adding another endpoint, say pseudo-merge (will need to come up with some better name), in which the system will maintain the mapping from one id, say abc@example.com, to a set of ids which are all essentially the same user, say abc@example.com and xyz@edample.com, and during the query time, we can look up all events from the id set and do the merge on the fly. If you are interested in helping build that, I can point you to where in the source code needs to be modified.

On Monday, October 27, 2014, bonswouar <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

Well, no it doesn't really solve the problem.. As expected, alias when the user sign up works. But then, if the user comes back with a new device (for example), when I identify the user it doesn't "identify" the previous events (before the user logged in), meaning for an anonymous user all events before he's logged in won't be linked to the actual user timeline. Is there any way to get around that problem (I could participate to the project if this evolution is possible) ?

— Reply to this email directly or view it on GitHub https://github.com/Codecademy/EventHub/issues/13#issuecomment-60566137.

朱政道 Cheng-Tao Chu

bonswouar commented 10 years ago

I am definitely interested in helping. I'll take a deeper look at the source code, if you have any hint that might be helpful don't hesitate to share it !

chengtao commented 10 years ago

Great, all the api endpoints can be found in web/src/main/java/com/codecademy/eventhub/web/commands

We will need another index to track that, given an user id, what other user ids have events that need to be merged. The implementation of some other indices can be found in hub/src/main/java/com/codecademy/eventhub/index

Then, we will also need to modify all the public methods for query in hub/src/main/java/com/codecademy/eventhub/EventHub.java which includes getUserEvents, getFunnelCounts, and getRetentionTable...

Lastly, we will need to modify that test cases accordingly.