Open haphut opened 5 years ago
Latest thoughts:
Pulsar log compaction won't work because it keeps the last one and requires manual invocation. Pulsar deduplication won't work since it's only for making sure same consumer message won't be published twice and uses MessageId's for that.
HFP stream contains duplicate messages due to MQTT QoS 1. And if we run multiple instances of pulsar-mqtt-source, we need to deduplicate those streams as well. Keep the first copy of each unique message.
Implement using Pulsar Functions.
Run on Docker host / Docker Swarm using Pulsar Admin API or CLI.
Ask Pulsar devs whether they are interested in: 1) Streaming compaction of topics instead of cron jobs. 2) Compaction by retaining only the first instances of unique messages, not the last.