Open jerryshao opened 2 months ago
@justinmclean , can you please post the design doc here, so we can we have a discussion on the issue.
I think it is simple to use the current Java Gravitino to create a CLI tool. Current Python client only supports Fileset, if you want to use Python, then you have to write all the rest protocols from ground.
Also, for the CLI arguments, can you please investigate some other similar projects to see how they design and how we can refer?
Besides, I think the design doc should list all the commands about list, create, get, alter, and delete for all the entities, also should include access control operations and tag operations, to achieve them in CLI also.
Also please put the design doc in the gdoc, so we can comment inline.
The above isn't using the Python client. It's using the REST interface as indicated by "The CLI would be implemented in Python using requests library for the REST interface and click for the CLI." - this is what several other catalogs do.
Then you have to implement all the REST interfaces including metalake, catalog, schema, table..., also including authorization, this is a huge work, why can't we use the existing Java client and do a simple wrapper to achieve the CLI?
The entire basic REST interface in Python would be about 100 lines of code, much less than the equilivant Java code.
I don't think so, once you implement some complicated "create" or "alter" command, you will have to handle complicated cases including serialization and deserialization. Also, in Java client we already achieve some authorization methods like Kerberos, oauth, etc. If you want to write them in plain using Python, you will have to deal with them. It's not as simple as you think.
I have also looked at several other CLIs and the design above is similar to how they do it. My initial thought was not to implement the entire API/REST interface, but it can be extended once we we have something that is useful.
If you're not doing a complete CLI, then there's no difference compared to the current web UI, users still cannot fully experience the whole features easily.
The initial objective (as described above) is not to implement the full API, but we can expand on the initial API to eventually cover everything. If you want me to break it up into stages and put what is developed in each, I can do that.
My feeling is that if we use Java client for CLI, then we already handle most of the of the JSON serde and security things, so we can only focus on CLI implementation. But if we choose to use Python, we will need to implement from the ground, this may take lots of work, especially since we have several data structures like "type", and "expression", they're nested and complicated to serialize/deserialize. If we choose Java, the current Java client already did it for us, so we don't have to do it again. The key thing is that for CLI, we don't have to do the JSON/REST thing again, we can leverage the current client and focus on the CLI thing only. It's not a problem of choosing languages, it's a problem of not doing duplicated work. If there's a full-functionality Python client, then I'm also fine with a simple wrapper of that Python client to achieve CLI in Python. It's just my comment, I think we should discuss this in the community and involve others' opinions, I will also post this on the issue.
The initial objective (as described above) is not to implement the full API, but we can expand on the initial API to eventually cover everything. If you want me to break it up into stages and put what is developed in each, I can do that.
This is a huge task, it should be broken into multiple small tasks to achieve them step by step.
Link to design doc: https://docs.google.com/document/d/19CXHeg_5iphO8D3UD16rexVE23M4TMZXOU3X0lBqL6A/edit?usp=sharing
IMO, the current design doesn't seem to be very different from using curl
directly.
As a CLI, I think it's more about user interaction, and results presentation, but the current CLI seems to be more of a simplification of the use of the curl
tool, and doesn't bring a lot of convenience to the user.
Feel free to point out if I've misunderstood anything!
It is easier to use than curl as the user doesn't need to construct correct REST URLs or supply correctly formatted JSON. It fills in many of the default parameters, and its output is more user-friendly.
Describe the proposal
Currently, Gravitino provides web UI, Java SDK, Python SDK to manipulate the metadata. The current web UI is not a full functionality UI, user still needs to write Java, Python codes or directly use REST APIs to communicate with Gravitino, which makes user hard to use Gravitino at the first stage.
Instead of adding more and more features in web UI, we think that adding a simple CLI interface will significantly help the users, especially the developers. So in this EPIC issue, we are planning to the CLI support for Gravitino.
Task list