fugue-project / fugue

A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
https://fugue-tutorials.readthedocs.io/
Apache License 2.0
1.92k stars 94 forks source link

[COMPATIBILITY] antlr4-python3-runtime 4.9.3 is no longer supported but spark is stuck with it #509

Open vspinu opened 10 months ago

vspinu commented 10 months ago

Spark has this comment in its pom.xml:

    <!-- Please don't upgrade the version to 4.10+, it depends on JDK 11 -->
    <antlr4.version>4.9.3</antlr4.version>

Inside our company we use antlr4-python3-runtime to implement some parts of spark sql parsing. We pin the version to the version used in spark for compatibility reasons. This renders fugue and downstream packages that started including fugue as core dependency (e.g. statsforecast) incompatible with our code base.

Also #327 required that We also want to make sure 4.9.* are still supported. which didn't realize as 4.11 is now the minimal requirement.

Would it be too difficult on your side to keep antlr4 minimum version the one which is used in spark? Thanks!

goodwanghan commented 10 months ago

Got it, yes we can take a look. We also plan to remove antlr from our core dependency starting from Fugue 0.9.0, but that will take a while.

goodwanghan commented 10 months ago

Actually I don't know why it matters. Fugue has very good support of pyspark, we have never seen such a conflict.

vspinu commented 10 months ago

It matters only when people use antlr on python side and want to match spark's version. Spark's dependency is on java side, hence no conflicts so far :)

I understand this is rather a corner case, so it's completely understandable if you decide not to give this a priority.

goodwanghan commented 8 months ago

Closing because this problem should have been resolved.

vspinu commented 8 months ago

It's actually not. Sorry for late reply, our internal pip mirror takes at least two weeks to update the packages and I switched to a different project in meanwhile. Now I am back to updating statsforecast and it still does not work. The antlr dependency of qpd and fugue-sql-antr are still 4.11.

I might be doing something wrong though:

 $ pipdeptree -p fugue
Warning!!! Possibly conflicting dependencies found:
* <our-internal-package>==1.215.dev0
 - antlr4-python3-runtime [required: >=4.9.3,<=4.9.3, installed: 4.11.1]
------------------------------------------------------------------------
fugue==0.8.6
  - adagio [required: >=0.2.4, installed: 0.2.4]
    - triad [required: >=0.6.1, installed: 0.9.1]
      - fs [required: Any, installed: 2.4.16]
        - appdirs [required: ~=1.4.3, installed: 1.4.4]
        - setuptools [required: Any, installed: 62.1.0]
        - six [required: ~=1.10, installed: 1.16.0]
      - fsspec [required: Any, installed: 2023.9.0]
      - numpy [required: Any, installed: 1.24.3]
      - pandas [required: >=1.2.0, installed: 1.5.3]
        - numpy [required: >=1.20.3, installed: 1.24.3]
        - python-dateutil [required: >=2.8.1, installed: 2.8.2]
          - six [required: >=1.5, installed: 1.16.0]
        - pytz [required: >=2020.1, installed: 2023.3]
      - pyarrow [required: Any, installed: 11.0.0]
        - numpy [required: >=1.16.6, installed: 1.24.3]
      - six [required: Any, installed: 1.16.0]
  - fugue-sql-antlr [required: >=0.1.6, installed: 0.1.7]
    - antlr4-python3-runtime [required: >=4.11.1,<4.12, installed: 4.11.1]
    - jinja2 [required: Any, installed: 3.1.2]
      - MarkupSafe [required: >=2.0, installed: 2.1.2]
    - triad [required: >=0.6.8, installed: 0.9.1]
      - fs [required: Any, installed: 2.4.16]
        - appdirs [required: ~=1.4.3, installed: 1.4.4]
        - setuptools [required: Any, installed: 62.1.0]
        - six [required: ~=1.10, installed: 1.16.0]
      - fsspec [required: Any, installed: 2023.9.0]
      - numpy [required: Any, installed: 1.24.3]
      - pandas [required: >=1.2.0, installed: 1.5.3]
        - numpy [required: >=1.20.3, installed: 1.24.3]
        - python-dateutil [required: >=2.8.1, installed: 2.8.2]
          - six [required: >=1.5, installed: 1.16.0]
        - pytz [required: >=2020.1, installed: 2023.3]
      - pyarrow [required: Any, installed: 11.0.0]
        - numpy [required: >=1.16.6, installed: 1.24.3]
      - six [required: Any, installed: 1.16.0]
  - jinja2 [required: Any, installed: 3.1.2]
    - MarkupSafe [required: >=2.0, installed: 2.1.2]
  - pandas [required: >=1.2.0, installed: 1.5.3]
    - numpy [required: >=1.20.3, installed: 1.24.3]
    - python-dateutil [required: >=2.8.1, installed: 2.8.2]
      - six [required: >=1.5, installed: 1.16.0]
    - pytz [required: >=2020.1, installed: 2023.3]
  - pyarrow [required: >=0.15.1, installed: 11.0.0]
    - numpy [required: >=1.16.6, installed: 1.24.3]
  - qpd [required: >=0.4.4, installed: 0.4.4]
    - adagio [required: Any, installed: 0.2.4]
      - triad [required: >=0.6.1, installed: 0.9.1]
        - fs [required: Any, installed: 2.4.16]
          - appdirs [required: ~=1.4.3, installed: 1.4.4]
          - setuptools [required: Any, installed: 62.1.0]
          - six [required: ~=1.10, installed: 1.16.0]
        - fsspec [required: Any, installed: 2023.9.0]
        - numpy [required: Any, installed: 1.24.3]
        - pandas [required: >=1.2.0, installed: 1.5.3]
          - numpy [required: >=1.20.3, installed: 1.24.3]
          - python-dateutil [required: >=2.8.1, installed: 2.8.2]
            - six [required: >=1.5, installed: 1.16.0]
          - pytz [required: >=2020.1, installed: 2023.3]
        - pyarrow [required: Any, installed: 11.0.0]
          - numpy [required: >=1.16.6, installed: 1.24.3]
        - six [required: Any, installed: 1.16.0]
    - antlr4-python3-runtime [required: >=4.11.1,<4.12, installed: 4.11.1]
    - pandas [required: >=1.2.0, installed: 1.5.3]
      - numpy [required: >=1.20.3, installed: 1.24.3]
      - python-dateutil [required: >=2.8.1, installed: 2.8.2]
        - six [required: >=1.5, installed: 1.16.0]
      - pytz [required: >=2020.1, installed: 2023.3]
    - triad [required: >=0.9.0, installed: 0.9.1]
      - fs [required: Any, installed: 2.4.16]
        - appdirs [required: ~=1.4.3, installed: 1.4.4]
        - setuptools [required: Any, installed: 62.1.0]
        - six [required: ~=1.10, installed: 1.16.0]
      - fsspec [required: Any, installed: 2023.9.0]
      - numpy [required: Any, installed: 1.24.3]
      - pandas [required: >=1.2.0, installed: 1.5.3]
        - numpy [required: >=1.20.3, installed: 1.24.3]
        - python-dateutil [required: >=2.8.1, installed: 2.8.2]
          - six [required: >=1.5, installed: 1.16.0]
        - pytz [required: >=2020.1, installed: 2023.3]
      - pyarrow [required: Any, installed: 11.0.0]
        - numpy [required: >=1.16.6, installed: 1.24.3]
      - six [required: Any, installed: 1.16.0]
  - sqlglot [required: Any, installed: 18.1.0]
  - triad [required: >=0.9.1, installed: 0.9.1]
    - fs [required: Any, installed: 2.4.16]
      - appdirs [required: ~=1.4.3, installed: 1.4.4]
      - setuptools [required: Any, installed: 62.1.0]
      - six [required: ~=1.10, installed: 1.16.0]
    - fsspec [required: Any, installed: 2023.9.0]
    - numpy [required: Any, installed: 1.24.3]
    - pandas [required: >=1.2.0, installed: 1.5.3]
      - numpy [required: >=1.20.3, installed: 1.24.3]
      - python-dateutil [required: >=2.8.1, installed: 2.8.2]
        - six [required: >=1.5, installed: 1.16.0]
      - pytz [required: >=2020.1, installed: 2023.3]
    - pyarrow [required: Any, installed: 11.0.0]
      - numpy [required: >=1.16.6, installed: 1.24.3]
    - six [required: Any, installed: 1.16.0]
vspinu commented 8 months ago

I am really confused, while fugue-sql-antlr is no longer requiring antlr4-python3-runtime 4.11 that is not propagated to the pip package:

fugue-sql-antlr==0.1.7
  - antlr4-python3-runtime [required: >=4.11.1,<4.12, installed: 4.11.1]
  - jinja2 [required: Any, installed: 3.1.2]
    - MarkupSafe [required: >=2.0, installed: 2.1.2]
  - triad [required: >=0.6.8, installed: 0.9.1]
    - fs [required: Any, installed: 2.4.16]
      - appdirs [required: ~=1.4.3, installed: 1.4.4]
      - setuptools [required: Any, installed: 62.1.0]
      - six [required: ~=1.10, installed: 1.16.0]
    - fsspec [required: Any, installed: 2023.9.0]
    - numpy [required: Any, installed: 1.24.3]
    - pandas [required: >=1.2.0, installed: 1.5.3]
      - numpy [required: >=1.20.3, installed: 1.24.3]
      - python-dateutil [required: >=2.8.1, installed: 2.8.2]
        - six [required: >=1.5, installed: 1.16.0]
      - pytz [required: >=2020.1, installed: 2023.3]
    - pyarrow [required: Any, installed: 11.0.0]
      - numpy [required: >=1.16.6, installed: 1.24.3]
    - six [required: Any, installed: 1.16.0]

I can also confirm it directly in the installed package:

image
vspinu commented 8 months ago

Arh, ... it's still in the requirements.txt

goodwanghan commented 8 months ago

Ah I see, but requirements is not a part of the pypi package, the requirement should have nothing to do with it. But let me investigate. This is very weird.

goodwanghan commented 8 months ago

@vspinu I have found the issue and here is the PR to fix this problem. https://github.com/fugue-project/fugue-sql-antlr/pull/22 Sorry it took a bit long.

0.1.8 should be ready tomorrow

shchur commented 5 months ago

@goodwanghan It seems that this issue is still blocked by the qpd dependency that requires antlr4-python3-runtime>=4.11.1, which should be fixed in the 0.9.0 release of fugue.

shchur commented 4 months ago

@goodwanghan do you have a planned date for the 0.9.0 release?

The hard dependency of fugue on antlr4-python3-runtime>=4.11.1 via qpd has been a major blocker for our project for ~1 year now (https://github.com/autogluon/autogluon/issues/3458), preventing us from upgrading to statsforecast>=1.5 (released Feb 2023). The fugue-0.9.0 release where qpd becomes an optional dependency would be extremely helpful in enabling us to resolve this issue.

goodwanghan commented 3 months ago

@shchur I apologize for the delay. Could you try fugue 0.9.0.dev3? I think we will release 0.9.0 soon, but if you could try the latest release, it can be helpful for accelerating the release.

shchur commented 3 months ago

Hi @goodwanghan, thank you for the response! I just checked, pinning the dependency to fugue==0.9.0.dev3 resolves the ANTLR4 incompatibility issue on our side. I would still prefer to include the stable 0.9.0 release in our dependencies to ensure that we don't break anything for users that depend on both autogluon and fugue

goodwanghan commented 3 months ago

@shchur we will release Fugue 0.9.0 in two weeks. Thanks for the confirmation.