Closed DRavikanth closed 3 years ago
Thanks for opening your first issue here! Be sure to follow the issue template!
@DRavikanth - this is a community driven project, maybe you would like to work on this and submit PR?
Especially if there will be no-one to pick it up quickly - this is the best way to speed it up. I am happy to help with reviewing and guiding you if needed @DRavikanth !
@potiuk, I am happy to work but, I have no knowledge on this/airflow. I recently started to learn and evaluate Airflow for few of our use cases.
yes, It would be great if you can guide me along.
yes, It would be great if you can guide me along.
I do not know much about presto - but when we join forces...
The gist of it is the Presto Operators to start using the client library you mentioned - so it has to be changed in setup.py
- there is presto
extra there and it has the dependencies there -> changing them to use Trino dependency is a good idea.
In fact we've already discussed renaming the provider to "trino" (we are aware of the name change) so a good start would be to find all the places where "presto" is mentioned in the code and create a new provider trino
that would be just copy of presto and you could use the new dependency in the new provider instead of the old one.
That would be under "airflow/providers/presto" "tests/providers/presto", "docs/apache-airflow-providers-presto" and few other common places.
That would give a chance for Facebook to still maintain the presto provider if they want to. Then just switching imports to the new library should do the trick I believe :).
That woudl be a good start. There is some more work needed to make integration tests working (we have a docker compose to run presto database). But we can worry about it next (@mik-laj implemented even a kerberos integration there) - happy to guide you trough the process when we get there.
There was a message about it in the mailing list : https://lists.apache.org/thread.html/r8c1f1c330113052ae0a27c55611f582dcf1aaae84cc8f0a0cc37d5a2%40%3Cdev.airflow.apache.org%3E
Assuming creating integration from scratch with https://github.com/trinodb/trino-python-client the scope is adding a new provider. A reference for adding a new provider can be found in https://github.com/apache/airflow/pull/13324 (You can start by adding only TrinoHook under the new Trino provider)
I'm not sure if it requires to discuss this again on the mailing list. It's a bit tricky as some of Trino users (like me) are currently using Presto provider and it might be difficult to notify about splitting the provider.
In any case I'd be happy to help with review as I'm running Trino and currently working with the existed PrestoHook (which uses prestodb package)
Thanks for the tips @potiuk / @eladkal. I will check that and spend sometime to work on this.
Description We currently are having PrestoHook that underneath uses presto-python-client. Now with Trino (formerly known as PrestoSQL) is renamed and there are a change of headers in Trino presto-python-client will no longer work and so PrestoHook provider breaks.
What do you want to happen? Ask is to provide support for TrinoHook which uses https://github.com/trinodb/trino-python-client underneath.