pathspec version too low to support DVC #115

Closed millsks closed 1 year ago

millsks commented 1 year ago

Solution to issue cannot be found in the documentation.


When trying to install the DVC package into our environment that also includes apache-airflow that is dependent on the airflow package. The DVC package requires pathspec >=0.10.3 so it fails to solve the required environment specs.

Could not solve for environment specs
The following packages are incompatible
├─ apache-airflow >=2.4.0  is installable and it requires
│  └─ airflow [>=2.4.0,< |>=2.4.1,< |...|>=2.6.3,< ], which requires
│     └─ pathspec [>=0.9.0,<0.10.0 |>=0.9.0,<0.10.dev0 ], which can be installed;
└─ dvc >=3.14.0  is uninstallable because it requires
   └─ pathspec >=0.10.3 , which conflicts with any installable versions previously reported.

Installed packages

xylar commented 1 year ago

So the spec for pathspec comes from upstream: I think I have that constraint right here, based on:

  ==> pathspec >=0.9.0,<0.10.0
xylar commented 1 year ago

It seems like something you will need to bring up in the main Airflow repo, since there may be an important reason for the constrain.

millsks commented 1 year ago

Airflow PR: apache/airflow#33349

xylar commented 1 year ago

Great! Please let me handle the 2.7.0 release unless you are eager to learn the ropes more broadly. Airflow can be exceedingly complicated to update at times.

millsks commented 1 year ago

Sounds good. I will let you handle it, but I would love to learn the ins and outs so I can help maintain the feedstock if you guys are open to that.

xylar commented 1 year ago

@millsks, sounds good. I can ping you when there's a PR and talk you through the process. We can also add you as a maintainer.

xylar commented 1 year ago

A part of maintaining Airflow is maintaining the ~80 to 100 providers. Here's an example: I don't know if I have recipes for all the providers that are available on PyPI. I'll want to check on that after the 2.7.0 release. If you want, I can include you as a maintainer on new providers that I add (or any that are relevant to your own work).

millsks commented 1 year ago

I am more than happy to help with this and help maintain too! Always up for learning something new. I am ok with helping to maintain new and existing providers. Let me talk with my team to see if there are specific providers that we would definitely want to be listed on.

Is this the full list of providers that you currently maintain? Are you maintaining the actual provider package or the feedstock for the provider package?

xylar commented 1 year ago

I have no involvement with the maintenance of the main Apache repository, including the providers. I only maintain the packages on conda-forge.

The list of providers you linked to is the complete one. That's what I will need to check on to see which we might be missing on conda-forge.

millsks commented 1 year ago

I was looking at the list and wrote a quick script to extract the provider list and see if they exist on conda-forge. It was quick and dirty, but looks to be accurate at first glance. The provider feedstock may be under a different name? See the attached spreadsheet or run the script if you wanted to double check.

import requests
import re
import urllib3
import xlsxwriter


apache_airflow_provider_index_url = ""
apache_airflow_provider_index_response = requests.get(url=apache_airflow_provider_index_url, verify=False)

if apache_airflow_provider_index_response.ok is False:
    raise Exception("Airflow Provider Index URL not accessible!")

provider_names = set()

for provider_entry_doc_index in [line.split('"')[1] for line in apache_airflow_provider_index_response.text.splitlines() if"apache-airflow-providers-.*index\.html", line)]:
    apache_airflow_provider_doc_index_url = f"{provider_entry_doc_index}"
    apache_airflow_provider_doc_index_response = requests.get(url=apache_airflow_provider_doc_index_url, verify=False)

    if  apache_airflow_provider_doc_index_response is False:
        raise Exception(f"{apache_airflow_provider_doc_index_url} not accessible!")

    for apache_airflow_provider_pip_install in [line for line in apache_airflow_provider_doc_index_response.text.splitlines() if"pip.*install", line)]:
        x1 = re.sub("^.*apache-airflow-providers-", "apache-airflow-providers-", apache_airflow_provider_pip_install)
        x2 = re.sub("<.*$", "", x1)

workbook = xlsxwriter.Workbook('/tmp/airflow_provider_feedstock_check.xlsx')
worksheet = workbook.add_worksheet()
row = 1

worksheet.write(0, 0, "Provider Feedstock")
worksheet.write(0, 1, "Exists?")

for pname in sorted(list(provider_names)):
    feedstock_check_response = requests.get(url=feedstock_url, verify=False)
    feedstock_exists = "YES" if feedstock_check_response.ok else "NO"
    worksheet.write_url(row, 0, feedstock_url, string=pname)
    worksheet.write(row, 1, feedstock_exists)
    row +=1



xylar commented 1 year ago

Great, that's a big help and looks accurate.

xylar commented 1 year ago

@millsks just a heads up that Airflow 2.7.0 was just released today: I don't know exactly when I will get a chance to make the PR here or for the missing providers. But I will ping you once I do.

xylar commented 1 year ago

Adding missing providers in: