Closed mchades closed 1 month ago
@shaofengshi @jerryshao cc
Hi @mchades, I want to try to solve this issue.
I have done some code tracing from createTopic
in TopicOperationDispatcher
.
I think how about just make createTopic
synchronous by using like createTopicsResult.all().whenComplete()
in KafkaCatalogOperations
and forming new properties
just like loadTopic
implementation.
But it may have some performance issues, so just rewriting reload action to try and timeout is also a common solution. WDYT?
Hi @mchades, I want to try to solve this issue.
I have done some code tracing from
createTopic
inTopicOperationDispatcher
. I think how about just makecreateTopic
synchronous by using likecreateTopicsResult.all().whenComplete()
inKafkaCatalogOperations
and forming newproperties
just likeloadTopic
implementation. But it may have some performance issues, so just rewriting reload action to try and timeout is also a common solution. WDYT?
@noidname01 Sorry for the late reply.
What's the different between createTopicsResult.all().whenComplete()
and current usage of createTopicsResult.topicId().get()
:
https://github.com/datastrato/gravitino/blob/282fc0d488177b7944c99fa2806dee16143f37fe/catalogs/catalog-kafka/src/main/java/com/datastrato/gravitino/catalog/kafka/KafkaCatalogOperations.java#L229
Does createTopicsResult.all().get()
really ensure the synchronization of creation?
If this method can ensure synchronous creation, I think we can use it first, and performance issues can be considered in a separate PR.
@mchades Sorry for the late reply, too.
I've done some research in how Kafka creates the topics.
It seems there's no difference between createTopicsResult.all().whenComplete()
and createTopicsResult.topicId().get()
, because the Kafka server will return metadata all at once.
And even though the create method returns, the topic also needs to be propagated in Kafka system.
So there is an unmeasurable time diff between the time that the create method returns and the time the created topic can be described.
To really solve this, I think add retry mechanism on reload action might be a feasible solution. WDYT?
@mchades Sorry for the late reply, too.
I've done some research in how Kafka creates the topics. It seems there's no difference between
createTopicsResult.all().whenComplete()
andcreateTopicsResult.topicId().get()
, because the Kafka server will return metadata all at once. And even though the create method returns, the topic also needs to be propagated in Kafka system. So there is an unmeasurable time diff between the time that the create method returns and the time the created topic can be described.To really solve this, I think add retry mechanism on reload action might be a feasible solution. WDYT?
Thanks for your feedback! So the conclusions and solutions obtained at present are as I described in the description and you tend to prefer option 2, right?
How should we improve?
Here are three solutions I can think of right now:
remove the reload action in creation and alteration If we choose to remove the reload operation, then we may need to refactor the API so that the return value is
viod
, otherwise it will result in an inconsistency with the manual loadadd await timeout to the reload action If the corresponding operation is not completed after the timeout, whether to return success or failure to the client is a point that needs to be considered.
Yes, you're right.
@noidname01 After discussing with @jerryshao offline, it is better to only return the properties passed by the user during creation. This change allows us to eliminate the reload action from the creation process.
@noidname01 After discussing with @jerryshao offline, it is better to only return the properties passed by the user during creation. This change allows us to eliminate the reload action from the creation process.
So it is basically option 1 above but the return value is changed from void
to Topic
created using properties passed by the user during creation, instead of getting properties from reload action.
@noidname01 After discussing with @jerryshao offline, it is better to only return the properties passed by the user during creation. This change allows us to eliminate the reload action from the creation process.
So it is basically option 1 above but the return value is changed from
void
toTopic
created using properties passed by the user during creation, instead of getting properties from reload action.
yeah. BTW, not only for creating topic, but the table and fileset are also need to bechanged
Hi @noidname01 , are you still working on this? Do you need any updates or assistance?
@mchades Yeah, but I'm busy solving other issues like: #4012 and #3755
I've tried the solution we've been discussed, but the integration test would fail because there are some properties need to be filled by the server (like PARTITION_COUNT
and REPLICATION_FACTOR
in KafkaTopic
), so I will work on modifying the tests if this solution is acceptable.
@mchades Yeah, but I'm busy solving other issues like: #4012 and #3755
I've tried the solution we've been discussed, but the integration test would fail because there are some properties need to be filled by the server (like
PARTITION_COUNT
andREPLICATION_FACTOR
inKafkaTopic
), so I will work on modifying the tests if this solution is acceptable.
I think it's acceptable, but we should clarify this in the API comment.
@mchades Sure, will work on this and finish ASAP.
Hi @noidname01 , Is there any progress on this issue recently?
I would like to have it completed by the end of this week. If you have a draft locally, you can submit a PR, and then I can build on your work.
@mchades Sorry for the inactiveness in this PR. I've created a draft PR, I have done the main logic modification, the remaining to-do works is modifying the logic of ITs, which still use reload logic.
What would you like to be improved?
there is a reload action in object (table and topic) alteration and creation, which used to obtain some values generated by the underlying catalog
reload table in alteration: https://github.com/datastrato/gravitino/blob/3a973abf63952dc3102b78c0684705ef6cee4dc0/core/src/main/java/com/datastrato/gravitino/catalog/TableOperationDispatcher.java#L234-L242
reload topic in creation: https://github.com/datastrato/gravitino/blob/3a973abf63952dc3102b78c0684705ef6cee4dc0/core/src/main/java/com/datastrato/gravitino/catalog/TopicOperationDispatcher.java#L143-L148
The create and alter may be failed since the operations in the catalog are async:
3317
3496
How should we improve?
Here are three solutions I can think of right now:
remove the reload action in creation and alteration If we choose to remove the reload operation, then we may need to refactor the API so that the return value is
viod
, otherwise it will result in an inconsistency with the manual loadadd await timeout to the reload action If the corresponding operation is not completed after the timeout, whether to return success or failure to the client is a point that needs to be considered.