airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.3k stars 4.15k forks source link

Segment's Identify event should include the email address when available, if not anonymized #1757

Closed johnlafleur closed 2 years ago

johnlafleur commented 3 years ago

Right now, only the user id is added in the Identify, and we cannot know who did what (apart from Specified preferences).

cgardens

┆Issue is synchronized with this Asana task by Unito

cgardens commented 3 years ago

discussed / explored offline. we think it is possible the UI doesn't report email address. the backend definitely does though and the identity all get merged in amplitude so it shouldn't be adversely affecting any of the analytics reporting. it does mean that tests run on the UI for the identify action don't include email.

worth following up on next week to just get it all cleaned up but it does not appear to be breaking the analytics story so going to wait until then.

cgardens commented 3 years ago

I checked the frontend side. It indeed does not send the email address because after that form is submitted it doesn't have access to it anymore. If it's helpful we can give it access, but we should only bother if it fixes something in our analytics which i don't think we know that it does yet.

michel-tricot commented 3 years ago

I wonder if we should store the email in the workspace when it is not anonymized. It will also make it easier to tie users in papercups.

johnlafleur commented 3 years ago

Can we move forward on that soon? It makes it harder to analyze active companies' behavior.

johnlafleur commented 3 years ago

Email should be put as a trait, as in the screenshot: ![Uploading Screen Shot 2021-02-12 at 6.44.14 PM.png…]()

cgardens commented 3 years ago

i'm still confused why we need this. it seems like something is broken in segment or amplitude if this is needed. the identities should get merged (the one with the email and the one without). if that is not happening then we have big problems. injecting the email from the UI is just going to be treating a symptom.

if we do need it, this isn't a big task, but it requires an API change to expose data email addresses which is suspect from a privacy and security point of view. right now it feels like we will be exposing it for the sake of our analytics tooling not working properly.

johnlafleur commented 3 years ago

Thanks for the clarification. Let's put it on hold in that case.

timroes commented 2 years ago

@nataliekwong Is this issue still relevant? We still don't log the mail of users in segment calls, but I haven't heard this being a problem anytime recently, so want to check if this can be closed?

nataliekwong commented 2 years ago

I don't have context on the original issue - redirecting to @johnlafleur for input

johnlafleur commented 2 years ago

We can close it. thanks!