PostHog / posthog

🦔 PostHog provides open-source web & product analytics, session recording, feature flagging and A/B testing that you can self-host. Get started - free.
https://posthog.com
Other
22.39k stars 1.35k forks source link

Person properties are missing on every filters and "Data Management" section are empty on self-instance ! #25125

Closed mdisec closed 2 months ago

mdisec commented 2 months ago

Bug Description

Bug description

Fresh installation of self-instance is actually working pretty good. Everything works as I expected (such as People are created automatically and activity screen is full of events)

But Person properties are inaccessible on filters, even if I know there is a recorded person. This is literally a major bug for us because we would like to do filtering on literally everywhere based on Person properties.

image

Another issue I believe kind a related with each other is that, tabs such as Events, Properties, Annotations from "Data management" screen is empty all the time.

image

My Debugging Informations

For instance whenever you click on "Events" from "Data management" screen, API request goes to the /api/projects/1/event_definitions?limit=50&event_type=event endpoint, which simply populated by EventDefinitionViewSet class, which has queryset = EventDefinition.objects.all()

When I try to fetch the records from the db, it returns empty as I expected because API is also returning empty record.

>>> from posthog.models import EventDefinition
>>> EventDefinition.objects.all()
<QuerySet []>
>>>

When you try to load Person properties on filters I believe the request sent to the /api/projects/1/property_definitions?type=person&search=&limit=100&offset=0 endpoint, which is being handled by PropertyDefinitionViewSet where we have queryset = PropertyDefinition.objects.all()

Again I double check the records from the db and it's empty.

>>> from posthog.models import EventProperty, PropertyDefinition, User
>>> PropertyDefinition.objects.all()
<QuerySet []>
>>>

So it looks like somehow these tables are not being populated! I do sense these tables has to be populated asynchronous (maybe via clickhouse/worker stuff ?) Only place where I can see the data is being recorded via PropertyDefinition.objects.create is at infer_taxonomy_for_team() function which says in production, the plugin server is responsible for this - but in demo data we insert directly to ClickHouse.

But I really don't have experience on the project and don't know how to continue to the debugging.

Debug info

PostHog Hobby self-hosted with `docker compose`, version/commit: 54b0134f98168e53edab36c90b4db4d37094308d

which is a release from 11 hours ago.
mdisec commented 2 months ago

All right, what I've ended-up with is that these tables of the Postgresql meant to be populated by plug-in server as it was written in the comment. But these insert queries/logic are only implemented on Rust version of the plug-in server. Default hobby installer is actually going to run old NodeJS implementation. ( I havent seen any insert statemen to the posthog_propertydefinition table from the plugin-server folder, which is the location of the nodejs imp of plug-in)

I -somehow...- validated that the Posthog cloud is already using Rust plug-in instead of nodejs one. I'll look for way to use rust instead of nodejs with simple docker-compose configuration.. maybe this will sort out the issue I've been dealing over the weekend.lol.

mdisec commented 2 months ago

Spent my whole day on this one, here is the list of items I've learnt:

1 - Yep the property manager of nodejs implementation is responsible for 3 things. Tracks 1st event seen, updates event/property definitions, and auto-creates group-types.

Since the code block who responsible for specially for updating property definied has been removed from the git tree, no ones process and updates corresponding postgresql tables, which causes returns an empty list from the API.

https://github.com/PostHog/posthog/commit/91cd660b0e364301004c830b95bf46e4d985a60c

I can not find above commit on my copy (I copied the project during hobby installation scripts, which is also simply git clone innit ?!)

2 - You can build property-defs-rs by yourself.

cargo build --package property-defs-rs

KAFKA_HOSTS="localhost:9092" DATABASE_URL="postgres://posthog:posthog@localhost:5432/posthog" EVENT_TOPIC="clickhouse_events_json" RUST_LOG="debug" FILTER_MODE="opt_out" ./target/debug/property-defs-rs

PS: That mf forgotten DATABASE_URL parameter caused a PoolTimedOut rust issue which caused me invest 3 more hours.fyi becarefull.

Investing bit more time to build a proper composer config to run property-defs-rs container along side other instances is pretty streightforward.

3 - Even if I manage to compile, build and run the property-defs-rs It didn't actually write anything to the db. That's moment I realized these env variables,

    // Do everything except actually write to the DB
    #[envconfig(default = "true")]
    pub skip_writes: bool,

    // Do everything except actually read or write from the DB
    #[envconfig(default = "true")]
    pub skip_reads: bool,

I've been sitting in front of the computer for 8 hours without even had a chance to eat something. Dont even wanna learn how to pass False from the env variables ( is it False, or 0 or maybe even "0" ?) Sooo

Open the rust/property-defs-rs/src/app_context.rs file and

        // FROM THIS
        let transaction_time = common_metrics::timing_guard(UPDATE_TRANSACTION_TIME, &[]);
        if !self.skip_writes && !self.skip_reads {
            let mut tx = self.pool.begin().await?;

            for update in updates {
                update.issue(&mut *tx).await?;
            }
            tx.commit().await?;
        }

        // TO THIS. Just get ride of that if statement. 
        let transaction_time = common_metrics::timing_guard(UPDATE_TRANSACTION_TIME, &[]);
        let mut tx = self.pool.begin().await?;

            for update in updates {
                update.issue(&mut *tx).await?;
            }
            tx.commit().await?;

4 - Compile and run the service again. Generate some activities. Easiest way to do is python clident.

posthog = Posthog('phc_HLoaWhbwcpeUtg1OQrDmnmefjky2pmxSCgWkmpbpro3', host='http://localhost')
posthog.capture('1231337', '$pageview', {'$current_url': 'https://exampleeeee.com'})

Yet another trick. If you pass the host with https, there is no extra kwars for PostHog client class to disable ssl verification. Thus, dont forget to use http instead of https (yep another 30min invested here lol)

Spam activities and wait for the services to trigger the update/insert queries. (look for "Forcing small batch due to time limit" string in code base if you wanna bypass this batch update process as well)

5 - Happy ending.

6 - I believe the Posthog team is actively working on the rolling out the property-defs-rs rust version. Maybe instead of investing a day like me, you would like to wait for a month or two :)

Dear PostHog team; It looks a bit easy to bring back that property-definition parsing/update workers on the NodeJs project. Any plans about this ?

Cheers, m.

julius-otoy commented 2 months ago

@timgl @oliverb123 any idea if we'll see a fix for this sometime?

oliverb123 commented 2 months ago

Hey folks.

We will not be reintroducing property and event definitions processing in the node.js plugin server. I recommend looking at the docker-compose stacks - I recently pushed a change to include the property definitions service in the stack for local dev, and expect it to also work for self hosted, although it may require minor changes depending on your setup.

mdisec commented 2 months ago

Thanks for the clarification @oliverb123 !

@julius-otoy mate I know you've been dealing this issue for a sometime. Here is the configuration you need to add into your docker compose yml file. Everything must be working smoothly.

PS: You won't see these data immediately, wait for a while due to the property-defs-rs internal bulk-creating scheduling.

    plugins:
        extends:
            file: docker-compose.base.yml
            service: plugins
        image:  cracked_web
        environment:
            SENTRY_DSN: 'https://public@sentry.example.com/1'
            SITE_URL: https://url
            SECRET_KEY: <secretkey>
            OBJECT_STORAGE_ACCESS_KEY_ID: 'object_storage_root_user'
            OBJECT_STORAGE_SECRET_ACCESS_KEY: 'object_storage_root_password'
            OBJECT_STORAGE_ENDPOINT: http://objectstorage:19000
            OBJECT_STORAGE_ENABLED: true
            CDP_REDIS_HOST: redis7
            CDP_REDIS_PORT: 6379
        depends_on:
            - db
            - redis
            - redis7
            - clickhouse
            - kafka
            - objectstorage

    # https://github.com/PostHog/posthog/issues/25125
    property:
        image: ghcr.io/posthog/posthog/property-defs-rs:master
        restart: on-failure
        environment:
            KAFKA_HOSTS: 'kafka:9092'
            REDIS_URL: 'redis://redis:6379/'
            DATABASE_URL: 'postgres://posthog:posthog@db:5432/posthog'
            CAPTURE_MODE: events
            SKIP_WRITES: false
            SKIP_READS: false
            #RUST_LOG: debug
            FILTER_MODE: opt_out

I've somehow found the publicly accessible builds URL of the property-defs-rs. Thanks to auto builds from the Posthog team, we don't need to build the whole project from scratch with a docker files etc. We can directly use ghcr.io/posthog/posthog/property-defs-rs:master I hope you guys keep these docker images (I mean no the main post hog image but these new rust based projects) available for everyone in the future <3

julius-otoy commented 2 months ago

Thanks @mdisec! We won't use posthog after this but appreciate your help!