Closed mattpolzin closed 2 years ago
Thanks for opening your first issue here! Be sure to follow the issue template!
Why don't you submit a PR with a fix @mattpolzin ? Sounds lke an eaasy fix.
I appreciate the encouragement! I've got a lot of projects in progress right now so I don't think I'll find time to get set up here as well.
I marked it as good-first-issue, so maybe someone else will pick it up then !
Is this available to pick up and implement?
Sure @harishkrao . Assigned you.
But maybe @mik-laj would like to do it firsst - so agree between yourselves ;)
Thank you for getting back @potiuk. @mik-laj how would you like to proceed?
Feel free to create a PR. I will take care of the review.
Great, I will do so. Thank you.
@mik-laj I have made the change, I have a question on the unit tests. They are failing because the connection sends the region name. Do you have a suggestion on how to test without a region name? https://github.com/harishkrao/airflow/blob/sqlalchemy-snowflake-region/tests/providers/snowflake/hooks/test_snowflake.py#L94
@harishkrao Hi. I improved the tests yesterday and I think now it will be easy to write the new test case to handle this sitution.. See: https://github.com/apache/airflow/pull/20095
Now all you have to do is add a test case here.
https://github.com/apache/airflow/blob/545ca59ba9a0b346cbbf28cc6958f9575e5e6b0b/tests/providers/snowflake/hooks/test_snowflake.py#L76-L95
To build a URL, we shouldn't use string.format
. Instead of it, we should use snowflake.sqlalchemy.URL
. See: https://github.com/snowflakedb/snowflake-sqlalchemy#connection-parameters
The main problem with concatenation is the problem with handling of special characters due to the lack of URL-encoding.
Thank you @mik-laj. I will write the test cases based on the new tests you pushed yesterday. Also thank you for the SQL Alchemy URL build link. I will use it to construct the URI.
@mik-laj Made the below changes, running pytests pending, will let you know after running them tomorrow: https://github.com/harishkrao/airflow/blob/fix-sqlalchemy-snowflake-region/airflow/providers/snowflake/hooks/snowflake.py#L216-L239 https://github.com/harishkrao/airflow/blob/fix-sqlalchemy-snowflake-region/tests/providers/snowflake/hooks/test_snowflake.py#L268-L279
@mik-laj How did you manage dependencies when running pytests?
I am facing WARNING - Exception when importing 'airflow.providers.snowflake.hooks.snowflake.SnowflakeHook' from 'apache-airflow-providers-snowflake' package: No module named 'sqlalchemy.sql.roles'
Both snowflake-sqlalchemy
and sqlalchemy
are installed.
Airflow Breeze
You can also use constraint files. See: https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html
See also complete description here: https://github.com/apache/airflow/blob/main/TESTING.rst#airflow-unit-tests @harishkrao
Thank you @mik-laj and @potiuk, I will proceed.
I narrowed down the scope of our issue here. Once I am inside the virtual environment, and I try to import the same SQLAlchemy package, I run into the same error: `root@66ecbec1a1f0:/opt/airflow# python Python 3.8.12 (default, Nov 17 2021, 17:26:17) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information.
from snowflake.sqlalchemy import URL Traceback (most recent call last): File "
", line 1, in File "/usr/local/lib/python3.8/site-packages/snowflake/sqlalchemy/init.py", line 25, in from . import base, snowdialect File "/usr/local/lib/python3.8/site-packages/snowflake/sqlalchemy/base.py", line 17, in from .custom_commands import AWSBucket, AzureContainer, ExternalStage File "/usr/local/lib/python3.8/site-packages/snowflake/sqlalchemy/custom_commands.py", line 14, in from sqlalchemy.sql.roles import FromClauseRole ModuleNotFoundError: No module named 'sqlalchemy.sql.roles' ` It looks like there is a bug with SQLAlchemy 1.3.24, which is a dependency for Flask-AppBuilder. See 1621 and 953.
If you have some time spare, can one of you please try to import this package in your local venv and let me know if you face the same issue? @potiuk @mik-laj This package is not being used anywhere else in the entire airflow repository.
@harishkrao See: https://github.com/snowflakedb/snowflake-sqlalchemy/issues/234 To use URL class, you need to run:
python -m pip install snowflake-sqlalchemy==1.2.4 sqlalchemy==1.3.24
Unfortunately, Snowflake incorrectly marked the version requirements in one release of snowflake-sqlalchemy
and after fixing the problem, they didn't mark the release as yanked, so pip downloads invalid dependency set.
I pushed a fix: https://github.com/apache/airflow/pull/20245
Thanks very much for letting me know and for pushing the fix @mik-laj :) I will use the above versions. Edit: It worked :+1: Thanks a lot!
@mik-laj any suggestions to overcome celery dependency issues?
Searched the forum and online and found no open bugs related to it. Happens when running sudo ./breeze initialize-local-virtualenv --python 3.8
and the virtual environment has the celery version of 5.2.1
.
from celery import Celery, Task, states as celery_states ModuleNotFoundError: No module named 'celery'
AIRFLOW_VERSION=main
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
pip install celery --constraint "${CONSTRAINT_URL}"
See: https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html
My apologies for asking again, I am contributing to this repo for the first time. Will follow this and install the dependencies. Thank you for the help and patience @mik-laj.
Just an info - I am working on this, my laptop crashed, so I have to setup the environment in another laptop and continue, will get back soon.
Environment is configured, working on this, will get back in a day or two.
Made changes for the fix. Preparing the PR now.
@mik-laj a couple points for clarification. I added a unit test for the fix and it passed. I tried using the snowflake.sqlalchemy.URL
, but during running pytest
, it caused issues with generating the URL with nested dict values for session_parameters, so I reverted it to use string format, as is currently being used in the mainline branch.
@harishkrao Do you need help with preparing a PR?
@mik-laj Thank you for offering help, had to attend to something else until now. I will follow the steps in the guide and get back to you soon.
Just opened a PR, thank you for reviewing @mik-laj!
Apache Airflow Provider(s)
snowflake
Versions of Apache Airflow Providers
Versions 2.2.x (since https://github.com/apache/airflow/commit/0a37be3e3cf9289f63f1506bc31db409c2b46738).
Apache Airflow version
2.2.1
Operating System
Debian GNU/Linux 10 (buster)
Deployment
Other 3rd-party Helm chart
Deployment details
Bitnami Airflow Helm chart @ version 8.0.2
What happened
When connecting to Snowflake via SQLAlchemy using the Snowflake Hook, I get an error that the URL is not valid because my Snowflake instance is in US West 2 (Oregon) which means I don't provide a region explicitly. Snowflake's documentation says:
The error is that
xy12345..snowflakecomputing.com
is not a valid URL (note the double-dot caused by the lack of a region).What you expected to happen
I expect the connection to be successful.
How to reproduce
You can use the default snowflake connection if you have one defined and see this problem with the following one-liner:
Anything else
Fortunately I imagine the fix for this is just to leave the region URL component out when
region
isNone
here: https://github.com/apache/airflow/commit/0a37be3e3cf9289f63f1506bc31db409c2b46738#diff-2b674ac999a5b938fe5045f6475b0c5cc76e4cab89174ac448a9e1d41a5c04d5R215.Using version
2.1.1
of the Snowflake provider with version2.2.1
of Airflow is currently a viable workaround so for now I am just avoiding the update to the provider.Are you willing to submit PR?
Code of Conduct