confluentinc / ksql

The database purpose-built for stream processing applications.
https://ksqldb.io
Other
126 stars 1.04k forks source link

KSQL should not delete the fact topic not created by KSQL #3585

Open brianwhu opened 5 years ago

brianwhu commented 5 years ago

When a KSQL stream or table is registered from an existing Kafka topic with CREATE STREAM/TABLE statement, the topic does not belong to the stream or table, and therefore should be left alone even when the stream or table is dropped with the "DELETE TOPIC" clause.

On v5.2.1, the cli tool always deletes the associated topic when a stream or a table is dropped. This has the potential to cause serious data loss.

agavra commented 5 years ago

Hello @brianwhu - thanks for pointing this out! We've given this a lot of thought and decided a while back that this behavior is expected and should remain supported. If the user explicitly mentions DELETE TOPIC, then we should respect this explicit command; we don't differentiate between topics that were created by KSQL and those that weren't. This becomes especially important when you look at more recent features in 5.3 that allow KSQL to create a topic in a CREATE STREAM/TABLE statement if it doesn't exist, and then populate it using INSERT INTO ... VALUES.

I do, though, understand your concern. I think the proper solution to this problem is ACLs on topics that are important - any production topic that should not be "deletable" by KSQL should be guarded by the proper permissions (see: https://docs.confluent.io/4.0.0/kafka/authorization.html#using-acls).

Note that in 5.3.x we made this behavior much safer (see #2680) so that at least we won't keep deleting the topic every time we recover...

I'm closing out this ticket to keep our backlog clean, but feel free to continue the conversation!

brianwhu commented 5 years ago

Thanks for sharing your thoughts and reasoning for the current design. I see your points, particularly the one on DELETE TOPIC being explicitly expressed by a user. I do want to share a couple of brief points in favor of my position, just in case you want to give it some more thoughts.

  1. In principle, CREATE and DROP are mirror operations where DROP removes what CREATE brings about, possibly including implicitly created objects but nothing more. If KSQL knows it didn't create a topic it really has no business in destroying it, no matter what the user asks. Not deleting those topics is actually more consistent with the CREATE/DROP contract all SQL users have come to expect.

  2. The use cases where the behavior in question is expected and natural are probably in the minority. I'd argue that when a user issues DROP TABLE/STREAM ... DELETE TOPIC where the topic is preexisting, the DELETEclause is most likely a mistake. It's probably better to do less and give user the option.

Note: I do see that DELETE TOPIC is quite safe and valuable when the topics are auto-generated by KSQL.

agavra commented 5 years ago

You bring up some really good points @brianwhu, but I think it boils down to opinion (both of our positions are valid). I'll reopen this ticket and let some other people chime in with their thoughts (cc @rodesai @big-andy-coates @vcrfxia @hjafarpour ) to see what people find more intuitive.

Here are some other things I had thought of:

In principle, CREATE and DROP are mirror operations where DROP removes what CREATE brings about, possibly including implicitly created objects but nothing more.

There are two mental models we can adopt about KSQL - the first is that KSQL "owns" all the topics that it creates, the second is that KSQL creates topics as a convenience but it doesn't "own" anything. I think KSQL should be modeled after the latter, which also means that topic deletion is a convenience. This makes it very easy to reason about resource ownership in KSQL, and defers this very complicated problem to another system - the ACL system (e.g. what happens if I manually delete a topic that KSQL created, should KSQL be allowed to re-create it because it "owned" that namespace? just because KSQL created a topic, should that mean that I - a random producer - should not be allowed to produce to it? what does ownership mean in a scenario with multiple KSQL clusters?). All of these side effects that can happen outside of KSQL make the resource ownership problem nuanced and subtle, and I don't think a simple "if I created it, I own it" heuristic works.

If KSQL knows it didn't create a topic it really has no business in destroying it, no matter what the user asks.

Along the same lines, it's very difficult for me to actually know this. What if I shut down my cluster, delete and re-create a topic manually and then restart my cluster? Does KSQL still "own" that topic? I don't think KSQL is the right system to decide who has CRUD capabilities on Kafka resources - that should be Kafka itself, via the ACL mechanism provided. It's similar to any other usage of the Admin API - it doesn't restrict me from deleting only the topics that I created.

The use cases where the behavior in question is expected and natural are probably in the minority.

It's hard to know this without hard evidence :D I've used KSQL quite often as a tool to manage my kafka cluster - almost like an "admin". This is where it would be nice to get more feedback.

Anyway, thanks for engaging - I really appreciate these discussions because it makes us question our choices and I'm always ready to reconsider.

brianwhu commented 5 years ago

Hi @agavra, thanks again for your thoughtful comment. Clearly you've gone over many aspects and possibilities of KSQL interacting with topics that I have not.

Is it possible to let KSQL delete only those topics that it names?

Again, allow me to do some risky business of guessing user intentions here.

Could this be something that is compatible with the current design and mental models? From where I stand, this would be a significant improvement in guarding against unexpected topic drop and data losses.

mjsax commented 3 years ago

I think https://github.com/confluentinc/ksql/pull/7474 should address this issue.

\cc @abbccdda can we extend the KLIP to mention that the DELETE TOPIC clause will not be supported for "source stream/tables".

mjsax commented 3 years ago

@brianwhu Would you agree that we can close this ticket after KLIP-49 is done?