Open ankushpurwar opened 1 year ago
Thanks for opening your first issue here! Be sure to follow the issue template!
Sounds like a good feature, want to work on it and be an Airflow contributor?
Can I work on this issue?
@zazemlenie Sure, I assigned it to you
Has there been any development on this? Would like to contribute if possible, are we planning to integrate the functionality of fetching only specific fields for every GET end-point? Won't there be an issue with the query string getting too long, or do we plan to impose limits on the granularity of fields that can be fetched? @hussein-awala @zazemlenie
I'm working on this issue. I haven't run into the query string issue you mentioned, but I'll check it out more precisely
Awesome, let me know if you need any help!
Hi, I would like to contribute to this issue, could I be assigned it? Thank you
@hussein-awala @maahir22
Hello I would like to contribute to this issue, could I be assigned it?
While I am new to airflow, can I get some help? I can locate airflow/api_connexion/endpoints/dag_endpoint.get_dags
, but who called this function? I saw SQLAlchemySchema.dump
is used directly as a return, how to extract the required field is a good practice?
I assigned you - but part of the task is to propose how to do it. Generally speaking, generic retrieval/update of partial information is somethingh that GraphQL attempted to do as the "next gen" API, attempting to "fix" what REST got broken.
However, my personal opinion (and of many people) is that GraphQL is quite a bit TOO generic. It is relatively popular and used in quite a few places - but mostly in "corporate" world and big installation because - unlike REST it is not intuitive and learning curve is, welll, steep IMHO. I never gotten to be thrilled with the idea of learning more about GraphQL and getting the hang of it personally. Also it tried to address all-but-kitchen-sink aspects of the API (including rate limiting, introspection, etc. . in most of the implementations are very difficult to get performance right and there are plenty of other issues with it.
You can read for example here https://blog.logrocket.com/graphql-vs-rest-api-why-you-shouldnt-use-graphql/
IMHO (but this is my opinion) - we need something much simpler and straightforward here and rather then defining and following a "standard", we should possibly tap into other people doing similar things - because our API is described with OpenAPI definition and our REST points documentation and swagger UI and everything we have in the API is generated. That's especially important as our Clients (notably https://github.com/apache/airflow-client-python) are generated using OpenAPI client generator that translates the OpenAPI specification into Python classes that you can import and use directly. This goes for other languages as well.
This is a bit tricky, because the generator produces objects returned, so if API returns partial objects, then it cannot return ACTUAL OBJECTS. It can return dictionaries for example, or some Proxy Objects that actually only contain part of data and the rest of the data might be retrieved lazily.
So finding a way how to do it so that it is:
a) simple b) builds on top of REST not changing it to GraphQL c) nicely integrates with OpenaAPI definition, Swagger d) integrates with Open API generators to allow such partial retrieval
So this task is really:
What is the meaning of POC
, please? As you mention implement a POC
.
@potiuk
Proof Of Concept.
@potiuk
Thank you for telling me about the task in detail!🌸 But just for dags
or dagRuns
, isn't it OK to just add 'only' parameter when the Schema() is created?
dag_schema = DAGSchema(only=fields)
return dag_schema.dump(dag, )
And add nullable: true
to airflow/api_connexion/openapi/v1.yaml
in returns properties.
I think it's hard to solve this task generally for now because swagger yaml files are not automatically generated from the schema. If it can be generated automatically, yaml can also set nullable
values based on whether the schema field required
is true or not
This issue is about generic
funcrtionality. If you want to do only dags or dagRuns limited version - feel free to open PRs with fixes - but they would not close that issue.
Description
Airflow REST API should add a generic capability to retrieve the required information only. Instead sending all of them. E.g. If I want to retrieve DAG Run details using REST API: https://airflow.apache.org/api/v1/dags/{dag_id}/dagRuns/{dag_run_id} Or want to fetch list of DAGs using RET API: https://airflow.apache.org/api/v1/dags
It always returns the full details. Often it is the case where caller is not interested in all the information.
So I suggest to add a generic capability to retrieve only needed information just like offset and limit. E.g. if we pass fields = {dag_id, is_paused} in the query parameter while calling https://airflow.apache.org/api/v1/dags API, So it returns JSON body contains {dag_id, is_paused} fields.
Similarly it is true for other end points as well (At least Get Ones)
Use case/motivation
Related issues
Cannot say.
Are you willing to submit a PR?
Code of Conduct