Closed millsks closed 1 year ago
So the spec for pathspec
comes from upstream:
https://github.com/apache/airflow/blob/2.6.3/setup.cfg#L124
I think I have that constraint right here, based on:
https://peps.python.org/pep-0440/#compatible-release
pathspec~=0.9.0
==> pathspec >=0.9.0,<0.10.0
It seems like something you will need to bring up in the main Airflow repo, since there may be an important reason for the constrain.
Airflow PR: apache/airflow#33349
Great! Please let me handle the 2.7.0 release unless you are eager to learn the ropes more broadly. Airflow can be exceedingly complicated to update at times.
Sounds good. I will let you handle it, but I would love to learn the ins and outs so I can help maintain the feedstock if you guys are open to that.
@millsks, sounds good. I can ping you when there's a PR and talk you through the process. We can also add you as a maintainer.
A part of maintaining Airflow is maintaining the ~80 to 100 providers. Here's an example: https://github.com/conda-forge/apache-airflow-providers-google-feedstock I don't know if I have recipes for all the providers that are available on PyPI. I'll want to check on that after the 2.7.0 release. If you want, I can include you as a maintainer on new providers that I add (or any that are relevant to your own work).
I am more than happy to help with this and help maintain too! Always up for learning something new. I am ok with helping to maintain new and existing providers. Let me talk with my team to see if there are specific providers that we would definitely want to be listed on.
Is this the full list of providers that you currently maintain? Are you maintaining the actual provider package or the feedstock for the provider package?
I have no involvement with the maintenance of the main Apache repository, including the providers. I only maintain the packages on conda-forge.
The list of providers you linked to is the complete one. That's what I will need to check on to see which we might be missing on conda-forge.
I was looking at the list and wrote a quick script to extract the provider list and see if they exist on conda-forge. It was quick and dirty, but looks to be accurate at first glance. The provider feedstock may be under a different name? See the attached spreadsheet or run the script if you wanted to double check.
import requests
import re
import urllib3
import xlsxwriter
urllib3.disable_warnings()
apache_airflow_provider_index_url = "https://airflow.apache.org/docs/#providers-packages-docs-apache-airflow-providers-index-html"
apache_airflow_provider_index_response = requests.get(url=apache_airflow_provider_index_url, verify=False)
if apache_airflow_provider_index_response.ok is False:
raise Exception("Airflow Provider Index URL not accessible!")
provider_names = set()
for provider_entry_doc_index in [line.split('"')[1] for line in apache_airflow_provider_index_response.text.splitlines() if re.search("apache-airflow-providers-.*index\.html", line)]:
apache_airflow_provider_doc_index_url = f"https://airflow.apache.org{provider_entry_doc_index}"
apache_airflow_provider_doc_index_response = requests.get(url=apache_airflow_provider_doc_index_url, verify=False)
if apache_airflow_provider_doc_index_response is False:
raise Exception(f"{apache_airflow_provider_doc_index_url} not accessible!")
for apache_airflow_provider_pip_install in [line for line in apache_airflow_provider_doc_index_response.text.splitlines() if re.search("pip.*install", line)]:
x1 = re.sub("^.*apache-airflow-providers-", "apache-airflow-providers-", apache_airflow_provider_pip_install)
x2 = re.sub("<.*$", "", x1)
provider_names.add(x2)
workbook = xlsxwriter.Workbook('/tmp/airflow_provider_feedstock_check.xlsx')
worksheet = workbook.add_worksheet()
row = 1
worksheet.write(0, 0, "Provider Feedstock")
worksheet.write(0, 1, "Exists?")
for pname in sorted(list(provider_names)):
feedstock_url=f"https://github.com/conda-forge/{pname}-feedstock"
feedstock_check_response = requests.get(url=feedstock_url, verify=False)
feedstock_exists = "YES" if feedstock_check_response.ok else "NO"
worksheet.write_url(row, 0, feedstock_url, string=pname)
worksheet.write(row, 1, feedstock_exists)
row +=1
workbook.close()
Great, that's a big help and looks accurate.
@millsks just a heads up that Airflow 2.7.0 was just released today: https://github.com/apache/airflow/releases/tag/2.7.0 I don't know exactly when I will get a chance to make the PR here or for the missing providers. But I will ping you once I do.
Adding missing providers in: https://github.com/conda-forge/staged-recipes/pull/23729
Solution to issue cannot be found in the documentation.
Issue
When trying to install the DVC package into our environment that also includes apache-airflow that is dependent on the airflow package. The DVC package requires
pathspec >=0.10.3
so it fails to solve the required environment specs.Installed packages
Environment info