Flagsmith / flagsmith

Open Source Feature Flagging and Remote Config Service. Host on-prem or use our hosted version at https://flagsmith.com/
https://flagsmith.com/
BSD 3-Clause "New" or "Revised" License
4.79k stars 364 forks source link

No processed data for evaluation analytics #2769

Closed blackjid closed 11 months ago

blackjid commented 1 year ago

Describe the bug

I'm trying to use the evaluation data analytics but I can get that to show in the analytics tab. I can see there is data in the analytics raw tables, but not in the bucket tables.

To Reproduce

I'm testing with the flagsmith-on-flagsmith project, where the client reports the analytics.

I'm setting USE_POSTGRES_FOR_ANALYTICS=true in my env variables. I have the following option on the helm values, that in turn add the TASK_RUN_METHOD=TASK_PROCESSOR env variable to the pods

taskProcessor:
  enabled: true

I can see in the logs of the task processor the task.populate_bucket is registered.

I also tested the /processor/monitoring api endpoint and it returns

{"waiting":0}

How are you running Flagsmith?

gagantrivedi commented 1 year ago

@blackjid Have you set ANALYTICS_DATABASE_URL or DJANGO_DB_HOST_ANALYTICS...?

blackjid commented 1 year ago

No, I don't have those. In using the same database as flagsmith for analytics.

If I understand correctly the code, setting those variables only enable the use of a different database, but doesn't do anything else.

I did tried those variables, but I don't rollback when I read that I can use the same database as DATABASE_URL. Maybe I'm wrong

gagantrivedi commented 1 year ago

@blackjid Do you see any task runs in the table task_processor_recurringtaskrun;

blackjid commented 1 year ago

No, that table is empty

gagantrivedi commented 1 year ago

No, that table is empty

Hmm that means the task to populate the buckets did not run.

Can you tell me what do you see in task_processor_recurringtask

blackjid commented 1 year ago

Well, sorry, it wasn't really empty.

image

Checking the table task_processor_recurringtask it only has to lines. id:1 is tasks.populate_bucket

Without knowing what I'm doing, the column is_locked was true in the tasks.populate_bucket row, I manually change it to false. And now a new line appeared in the task_processor_recurringtaskrun; table..

image image
gagantrivedi commented 1 year ago

It looks like the thread that was processing the task died midway, leaving the task locked. Are you able to share some logs? (From stdout and stderr)

blackjid commented 1 year ago
segments.models INFO     Using re2 library for regex.
integrations.lead_tracking.pipedrive.lead_tracker INFO     Using re2 library for regex.
axes.apps    INFO     AXES: BEGIN LOG
axes.apps    INFO     AXES: Using django-axes version 5.32.0
axes.apps    INFO     AXES: blocking by username only.
System check identified some issues:

WARNINGS:
?: (axes.W002) You do not have 'axes.middleware.AxesMiddleware' in your settings.MIDDLEWARE.
task_processor.management.commands.runprocessor INFO     Processor starting. Registered tasks are: ['tasks.write_environments_to_dynamodb', 'tasks.send_environment_update_message_for_project', 'tasks.send_en
vironment_update_message', 'tasks.create_feature_state_went_live_audit_log', 'tasks.create_feature_state_updated_by_change_request_audit_log', 'tasks.create_audit_log_from_historical_record', 'tasks.create_s
egment_priorities_changed_audit_log', 'tasks.rebuild_environment_document', 'tasks.create_pipedrive_lead', 'tasks.send_email_changed_notification_email', 'tasks.create_health_check_model', 'tasks.clean_up_ol
d_tasks', 'edge_request_forwarder.forward_identity_request', 'edge_request_forwarder.forward_trait_request', 'edge_request_forwarder.forward_trait_requests', 'tasks.populate_bucket', 'tasks.track_feature_eva
luation', 'tasks.track_request', 'tasks.call_environment_webhook_for_feature_state_change', 'tasks.sync_identity_document_features', 'tasks.generate_audit_log_records', 'tasks.update_chargebee_cache', 'tasks
.send_org_over_limit_alert', 'tasks.update_organisation_subscription_information_influx_cache', 'tasks.update_organisation_subscription_information_cache']

that's all on the task-processor container.

gagantrivedi commented 1 year ago
segments.models INFO     Using re2 library for regex.
integrations.lead_tracking.pipedrive.lead_tracker INFO     Using re2 library for regex.
axes.apps    INFO     AXES: BEGIN LOG
axes.apps    INFO     AXES: Using django-axes version 5.32.0
axes.apps    INFO     AXES: blocking by username only.
System check identified some issues:

WARNINGS:
?: (axes.W002) You do not have 'axes.middleware.AxesMiddleware' in your settings.MIDDLEWARE.
task_processor.management.commands.runprocessor INFO     Processor starting. Registered tasks are: ['tasks.write_environments_to_dynamodb', 'tasks.send_environment_update_message_for_project', 'tasks.send_en
vironment_update_message', 'tasks.create_feature_state_went_live_audit_log', 'tasks.create_feature_state_updated_by_change_request_audit_log', 'tasks.create_audit_log_from_historical_record', 'tasks.create_s
egment_priorities_changed_audit_log', 'tasks.rebuild_environment_document', 'tasks.create_pipedrive_lead', 'tasks.send_email_changed_notification_email', 'tasks.create_health_check_model', 'tasks.clean_up_ol
d_tasks', 'edge_request_forwarder.forward_identity_request', 'edge_request_forwarder.forward_trait_request', 'edge_request_forwarder.forward_trait_requests', 'tasks.populate_bucket', 'tasks.track_feature_eva
luation', 'tasks.track_request', 'tasks.call_environment_webhook_for_feature_state_change', 'tasks.sync_identity_document_features', 'tasks.generate_audit_log_records', 'tasks.update_chargebee_cache', 'tasks
.send_org_over_limit_alert', 'tasks.update_organisation_subscription_information_influx_cache', 'tasks.update_organisation_subscription_information_cache']

that's all on the task-processor container.

Was the container recreated (maybe from a deployment or something)?

blackjid commented 1 year ago

mmm, well probably... I enabled LOG_LEVEL=DEBUG now.. and restarted de pod.

There is a lot of this... but no errors


2023-09-14 08:28:13 | 2023-09-14T11:28:13.954028454Z stderr F task_processor.processor DEBUG    No tasks to process. |  
2023-09-14 08:28:13 | 2023-09-14T11:28:13.587460925Z stderr F task_processor.processor DEBUG    No tasks to process. |  
2023-09-14 08:28:13 | 2023-09-14T11:28:13.568741599Z stderr F task_processor.processor DEBUG    No tasks to process. |  
2023-09-14 08:28:13 | 2023-09-14T11:28:13.566881489Z stderr F task_processor.processor DEBUG    No tasks to process. |  
2023-09-14 08:28:13 | 2023-09-14T11:28:13.565269146Z stderr F task_processor.processor DEBUG    No tasks to process.
blackjid commented 1 year ago

The other recurring task clean_up_old_tasks, should clean up the non recurring tasks tables...? Bacuase those tables are filled with +100K rows

So I'm guessing that recurring tasks to cleanup is not working neither.. 🤔

gagantrivedi commented 1 year ago

You'd have to set ENABLE_CLEAN_UP_OLD_TASKS andTASK_DELETE_RETENTION_DAYS for that to work

blackjid commented 1 year ago

You'd have to set ENABLE_CLEAN_UP_OLD_TASKS andTASK_DELETE_RETENTION_DAYS for that to work

aren't those set by default? https://github.com/Flagsmith/flagsmith/blob/8c1c89be3c74a0355bc8af82cd1b9196871465f3/api/app/settings/common.py#L859C1-L860

gagantrivedi commented 1 year ago

You'd have to set ENABLE_CLEAN_UP_OLD_TASKS andTASK_DELETE_RETENTION_DAYS for that to work

aren't those set by default? https://github.com/Flagsmith/flagsmith/blob/8c1c89be3c74a0355bc8af82cd1b9196871465f3/api/app/settings/common.py#L859C1-L860

Ah yes, do you see task runs older than 30 days?

blackjid commented 1 year ago

Ah yes, do you see task runs older than 30 days?

Yes, there are tons..

Something else I see now..

After unlocking the tasks.populate_bucket recurring task manually, an hour has passed, and new runs has been created.

Also I see lines new lines on the app_analytics_apiusagebucket table. I think that table gets processed in the same task as the app_analytics_featureevaluationbucket that remains empty.

gagantrivedi commented 1 year ago

But the raw table for that is app_analytics_featureevaluationraw can you confirm that you have data in that table?

blackjid commented 1 year ago

Yes, both raw tables has lots of data. app_analytics_featureevaluationraw and app_analytics_apiusageraw

gagantrivedi commented 1 year ago

Interesting... we will take a look

blackjid commented 1 year ago

Any idea? :)

gagantrivedi commented 1 year ago

No, it's been a busy week. Hopefully either tomorrow or on Monday, I will get some time to further analyse this

gagantrivedi commented 11 months ago

@blackjid have you set TASK_DELETE_INCLUDE_FAILED_TASKS if not can tell me the result of the following query

select * from task_processor_task where completed=true  order by id asc limit 1;

I suspect that the task that you see in the table (older than TASK_DELETE_RETENTION_DAYS) are just failed tasks, because the same code is working in our production environment

PS: sorry for the late response :pray:

gagantrivedi commented 11 months ago

Yes, both raw tables has lots of data. app_analytics_featureevaluationraw and app_analytics_apiusageraw

Can you paste the last few rows from app_analytics_featureevaluationraw

blackjid commented 11 months ago

This is the result to the query...

id,uuid,created_at,scheduled_for,task_identifier,serialized_args,serialized_kwargs,num_failures,completed,is_locked
18674784,6b7e080a-9bfe-44b1-80e6-73f304134707,2023-10-03T15:13:09.032Z,2023-10-03T15:13:09.031Z,tasks.track_request,[],"{""resource"": 2, ""host"": ""<redacted>"", ""environment_key"": ""<redacted>""}",0,1,

This are the last few rows of app_analytics_featureevaluationraw

image
blackjid commented 11 months ago

I haven't looked at this since the last time.. so I had to read again everything.. :) Some things have changed... (i'm running everything the same)

The app_analytics_featureevaluationbucket now it has some data.

image

~but the analytics tab for those flags in the UI still says there is no data~

I can also see in the task_processor_recurringtaskrun that the recurring tasks have been running successfully since I manually unlock it.

blackjid commented 11 months ago

I think everything is working now 🤷🏼

image

not sure what happened..

I'm going to enable analytics in my own environments and close this issue, thank so much for your time!