Open quantumlicht opened 1 year ago
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
I am new to airflow since it is a good first issue, I would like to work on it.
Assigned you.
The issue with this specifically is that schema_fields
is not part of the template_fields for that operator. I was able to work around my issue using schema_object
, but that's annoying because I need to save my schema to GCS, when I already have it in memory.
I have another issue with branching in task groups with mapped arguments, but I'll open a separate issue.
I still think this one is worth fixing if possible.
Sure @quantumlicht. I am on it. Just to be super-clear and to know if I understood the issue correctly, the issue you are facing is that the schema_fields here is a MappedArgument
but having it in {"databases":Dict[str, List[str]], "schemas": Dict[str, List[Dict[str, str]]]}
would be convenient and easy to use?
No the issue is that I cannot use my schema _fields variable in the GCSToBigQuery operator, because that field is not templated in that operator. At least that's what I think the problem is.
I am working on the issue. Did code fixes, got a free tier of google cloud and added credentials for bigquery, have run the tests. Not trying to check if the issue still exists.
I added the schema_fields to template_fields of the operator. I ran the tests for providers/google/cloud/test_gcs_to_bigquery.py with and without schema_fields in template_fields. The tests run GCSToBigQueryOperator with schema_fields as list of dicts. The tests pass in both cases i.e. with and without schema_fields in template_fields. I do not think that is the issue here @quantumlicht @potiuk.
Anyone looking into this? Seem's a bit annoying not to be able to use task group parameters into classic operators. The majority are classic ones, so it would be important to use this pattern on them for obvious reasons… This is not related to templated fields as stated early, but happens with custom operators and the providers ones (Google, Databricks, etc).
Do we have any update on this issue?
@arsunny -> evidently not and it waits for someone to pick it up. Would you llke to take on the task ?
Hey guys, so i encountered a similar issue with another Operator, and it seem that the issue is with the expand
part.
So when we are creating a dynamic task mapping on another operator directly, it works...
Example:
@task
def get_some_data(some_param):
return [ a list of data ]
my_data = get_some_data(xxx)
#task_group
def some_tasks(an_important_param):
@task
def my_simple_task(imp_param):
print(imp_param) ### this prints an item in [ a list of data ]
my_task = my_simple_task(an_important_param)
my_operator = SomePrintOperator( task_id="x_operator", some_param_to_print=an_important_param) ### assume that this operator just prints out `some_param_to_print` in the `execute()`
my_task >> my_operator
all_my_tasks = some_tasks.expand(my_data)
my_data >> all_my_tasks
So in this example, we have:
my_data
that just produce some dataall_my_tasks
with 2 printing tasks
my_task
which is a taskflow functionmy_operator
which is a subclass of BaseOperator
my_task
works just fine and is able to print an item from the list of data
my_operator
however prints an object of type MappedArgument
However if we do like this:
@task
def get_some_data(some_param):
return [ a list of data ]
my_data = get_some_data(xxx)
my_operator = SomePrintOperator.partial(
task_id="x_operator"
).expand(
some_param_to_print=an_important_param
)
my_data >> my_operator
The above works and my_operator
is able to print item from the list of data.
So the issue is when an instance of BaseOperator
is used in task_group which has been created using dynamic task mapping..
This is my issue: https://github.com/apache/airflow/discussions/39927
Apache Airflow version
Other Airflow 2 version (please specify below)
What happened
I'm using version
composer-2.1.15-airflow-2.5.1
usingcomposer-dev
to run it.Using a MappedArgument with
GCSToBigQueryOperator
returns an error: ` TypeError: Object of type MappedArgument is not JSON serializableI opened a discussion here: https://github.com/apache/airflow/discussions/31452, but I wonder if it might actually be a bug.
What you think should happen instead
It should work as with a regular XCOM Argument.
How to reproduce
Disclaimer, i'm relatively new to airflow 2 and taskflow. I'm trying to migrate a codebase written with airflow 1, so there might be some glaring problems with how I'm addressing the problem.
The Issue i'm having is with the
schema_fields
property passed toGCSToBigQueryOperator
which is aMappedArgument
instead of being resolve to a list of dicts like I expected.As a first step, the DAG loads the metadata from GCS with a task named
get_export_metadata
that returns a dict with the shape{"databases":Dict[str, List[str]], "schemas": Dict[str, List[Dict[str, str]]]}
(multiple_outputs)example:
Here's the task defined for my DAG
Operating System
Composer dev image (linux I assume)
Versions of Apache Airflow Providers
Deployment
Google Cloud Composer
Deployment details
This is running locally and I use the
dag.test()
command to execute itAnything else
N/A
Are you willing to submit PR?
Code of Conduct