cltk / cltk

The Classical Language Toolkit
http://cltk.org
MIT License
826 stars 326 forks source link

broken on latest Python in linux #1220

Closed howdood closed 1 year ago

howdood commented 1 year ago

Hi - I don't know enough to diagnose the bug, but on the same laptop that was running CLTK happily in Jupyter in Ubuntu Linux a year ago, it now crashes - full error message below. CLTK has been installed via pip as recommended; this is the mainline version not compiled from source.

I noted on one of the docs that CLTK is certified to run only on Python <=3.9 - however, 3.10 seems to be standard on many linux distros now. Is this a forward-compatibility issue? If so, it should be flagged as a dependency issue at point of installation on systems running 3.10

Thanks for looking at this! h


ValueError Traceback (most recent call last) Cell In[2], line 1 ----> 1 from cltk import NLP 2 import json 3 import csv

File ~/.local/lib/python3.10/site-packages/cltk/init.py:5 1 """Init module for importing the CLTK class.""" 3 import pkg_resources ----> 5 from .nlp import NLP 7 version = curr_version = pkg_resources.get_distribution( 8 "cltk" 9 ) # type: pkg_resources.EggInfoDistribution

File ~/.local/lib/python3.10/site-packages/cltk/nlp.py:9 7 from cltk.core.data_types import Doc, Language, Pipeline, Process 8 from cltk.core.exceptions import UnimplementedAlgorithmError ----> 9 from cltk.languages.pipelines import ( 10 AkkadianPipeline, 11 ArabicPipeline, 12 AramaicPipeline, 13 ChinesePipeline, 14 CopticPipeline, 15 GothicPipeline, 16 GreekPipeline, 17 HindiPipeline, 18 LatinPipeline, 19 MiddleEnglishPipeline, 20 MiddleFrenchPipeline, 21 MiddleHighGermanPipeline, 22 OCSPipeline, 23 OldEnglishPipeline, 24 OldFrenchPipeline, 25 OldNorsePipeline, 26 PaliPipeline, 27 PanjabiPipeline, 28 SanskritPipeline, 29 ) 30 from cltk.languages.utils import get_lang 32 iso_to_pipeline = { 33 "akk": AkkadianPipeline, 34 "ang": OldEnglishPipeline, (...) 51 "san": SanskritPipeline, 52 }

File ~/.local/lib/python3.10/site-packages/cltk/languages/pipelines.py:23 13 from cltk.core.data_types import Language, Pipeline, Process 14 from cltk.dependency.processes import ( 15 ChineseStanzaProcess, 16 CopticStanzaProcess, (...) 21 OldFrenchStanzaProcess, 22 ) ---> 23 from cltk.embeddings.processes import ( 24 ArabicEmbeddingsProcess, 25 AramaicEmbeddingsProcess, 26 GothicEmbeddingsProcess, 27 GreekEmbeddingsProcess, 28 LatinEmbeddingsProcess, 29 MiddleEnglishEmbeddingsProcess, 30 OldEnglishEmbeddingsProcess, 31 PaliEmbeddingsProcess, 32 SanskritEmbeddingsProcess, 33 ) 34 from cltk.languages.utils import get_lang 35 from cltk.lemmatize.processes import ( 36 GreekLemmatizationProcess, 37 LatinLemmatizationProcess, 38 OldEnglishLemmatizationProcess, 39 OldFrenchLemmatizationProcess, 40 )

File ~/.local/lib/python3.10/site-packages/cltk/embeddings/init.py:3 1 """Init for cltk.embeddings.""" ----> 3 from .embeddings import 4 from .processes import

File ~/.local/lib/python3.10/site-packages/cltk/embeddings/embeddings.py:27 24 from typing import List 25 from zipfile import ZipFile ---> 27 from gensim import models # type: ignore 29 from cltk.core.exceptions import CLTKException, UnimplementedAlgorithmError 30 from cltk.data.fetch import FetchCorpus

File ~/.local/lib/python3.10/site-packages/gensim/init.py:11 7 version = '4.3.1' 9 import logging ---> 11 from gensim import parsing, corpora, matutils, interfaces, models, similarities, utils # noqa:F401 14 logger = logging.getLogger('gensim') 15 if not logger.handlers: # To ensure reload() doesn't add another one

File ~/.local/lib/python3.10/site-packages/gensim/corpora/init.py:6 1 """ 2 This package contains implementations of various streaming corpus I/O format. 3 """ 5 # bring corpus classes directly into package namespace, to save some typing ----> 6 from .indexedcorpus import IndexedCorpus # noqa:F401 must appear before the other classes 8 from .mmcorpus import MmCorpus # noqa:F401 9 from .bleicorpus import BleiCorpus # noqa:F401

File ~/.local/lib/python3.10/site-packages/gensim/corpora/indexedcorpus.py:14 10 import logging 12 import numpy ---> 14 from gensim import interfaces, utils 16 logger = logging.getLogger(name) 19 class IndexedCorpus(interfaces.CorpusABC):

File ~/.local/lib/python3.10/site-packages/gensim/interfaces.py:19 7 """Basic interfaces used across the whole Gensim package. 8 9 These interfaces are used for building corpora, model transformation and similarity queries. (...) 14 15 """ 17 import logging ---> 19 from gensim import utils, matutils 22 logger = logging.getLogger(name) 25 class CorpusABC(utils.SaveLoad):

File ~/.local/lib/python3.10/site-packages/gensim/matutils.py:1030 1025 return 1. - float(len(set1 & set2)) / float(union_cardinality) 1028 try: 1029 # try to load fast, cythonized code if possible -> 1030 from gensim._matutils import logsumexp, mean_absolute_difference, dirichlet_expectation 1032 except ImportError: 1033 def logsumexp(x):

File ~/.local/lib/python3.10/site-packages/gensim/_matutils.pyx:1, in init gensim._matutils()

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

kylepjohnson commented 1 year ago

Hi, thanks for the report. About the error at the very bottom, seems you’re not the only one with this. Seems like a numpy compatibility issue in Gensim.

Two ideas: 1) update numpy like so: https://stackoverflow.com/a/66138833

2) install cltk fresh with a new virtualenv.

let us know, good luck! 

Apr 30, 2023 at 13:56 by @.***:

Hi - I don't know enough to diagnose the bug, but on the same laptop that was running CLTK happily in Jupyter in Ubuntu Linux a year ago, it now crashes - full error message below. CLTK has been installed via pip as recommended; this is the mainline version not compiled from source.

I noted on one of the docs that CLTK is certified to run only on Python <=3.9 - however, 3.10 seems to be standard on many linux distros now. Is this a forward-compatibility issue? If so, it should be flagged as a dependency issue at point of installation on systems running 3.10

Thanks for looking at this! h

ValueError Traceback (most recent call last) Cell In[2], line 1 ----> 1 from cltk import NLP 2 import json 3 import csv

File ~/.local/lib/python3.10/site-packages/cltk/> init> .py:5 1 """Init module for importing the CLTK class.""" 3 import pkg_resources ----> 5 from .nlp import NLP 7 > version> = curr_version = pkg_resources.get_distribution( 8 "cltk" 9 ) # type: pkg_resources.EggInfoDistribution

File ~/.local/lib/python3.10/site-packages/cltk/nlp.py:9 7 from cltk.core.data_types import Doc, Language, Pipeline, Process 8 from cltk.core.exceptions import UnimplementedAlgorithmError ----> 9 from cltk.languages.pipelines import ( 10 AkkadianPipeline, 11 ArabicPipeline, 12 AramaicPipeline, 13 ChinesePipeline, 14 CopticPipeline, 15 GothicPipeline, 16 GreekPipeline, 17 HindiPipeline, 18 LatinPipeline, 19 MiddleEnglishPipeline, 20 MiddleFrenchPipeline, 21 MiddleHighGermanPipeline, 22 OCSPipeline, 23 OldEnglishPipeline, 24 OldFrenchPipeline, 25 OldNorsePipeline, 26 PaliPipeline, 27 PanjabiPipeline, 28 SanskritPipeline, 29 ) 30 from cltk.languages.utils import get_lang 32 iso_to_pipeline = { 33 "akk": AkkadianPipeline, 34 "ang": OldEnglishPipeline, (...) 51 "san": SanskritPipeline, 52 }

File ~/.local/lib/python3.10/site-packages/cltk/languages/pipelines.py:23 13 from cltk.core.data_types import Language, Pipeline, Process 14 from cltk.dependency.processes import ( 15 ChineseStanzaProcess, 16 CopticStanzaProcess, (...) 21 OldFrenchStanzaProcess, 22 ) ---> 23 from cltk.embeddings.processes import ( 24 ArabicEmbeddingsProcess, 25 AramaicEmbeddingsProcess, 26 GothicEmbeddingsProcess, 27 GreekEmbeddingsProcess, 28 LatinEmbeddingsProcess, 29 MiddleEnglishEmbeddingsProcess, 30 OldEnglishEmbeddingsProcess, 31 PaliEmbeddingsProcess, 32 SanskritEmbeddingsProcess, 33 ) 34 from cltk.languages.utils import get_lang 35 from cltk.lemmatize.processes import ( 36 GreekLemmatizationProcess, 37 LatinLemmatizationProcess, 38 OldEnglishLemmatizationProcess, 39 OldFrenchLemmatizationProcess, 40 )

File ~/.local/lib/python3.10/site-packages/cltk/embeddings/> init> .py:3 1 """Init for > cltk.embeddings> .""" ----> 3 from .embeddings import 4 from .processes import

File ~/.local/lib/python3.10/site-packages/cltk/embeddings/embeddings.py:27 24 from typing import List 25 from zipfile import ZipFile ---> 27 from gensim import models # type: ignore 29 from cltk.core.exceptions import CLTKException, UnimplementedAlgorithmError 30 from cltk.data.fetch import FetchCorpus

File ~/.local/lib/python3.10/site-packages/gensim/> init> .py:11 7 > version> = '4.3.1' 9 import logging ---> 11 from gensim import parsing, corpora, matutils, interfaces, models, similarities, utils # noqa:F401 14 logger = logging.getLogger('gensim') 15 if not logger.handlers: # To ensure reload() doesn't add another one

File ~/.local/lib/python3.10/site-packages/gensim/corpora/> init> .py:6 1 """ 2 This package contains implementations of various streaming corpus I/O format. 3 """ 5 # bring corpus classes directly into package namespace, to save some typing ----> 6 from .indexedcorpus import IndexedCorpus # noqa:F401 must appear before the other classes 8 from .mmcorpus import MmCorpus # noqa:F401 9 from .bleicorpus import BleiCorpus # noqa:F401

File ~/.local/lib/python3.10/site-packages/gensim/corpora/indexedcorpus.py:14 10 import logging 12 import numpy ---> 14 from gensim import interfaces, utils 16 logger = logging.getLogger(> name> ) 19 class IndexedCorpus(interfaces.CorpusABC):

File ~/.local/lib/python3.10/site-packages/gensim/interfaces.py:19 7 """Basic interfaces used across the whole Gensim package. 8 9 These interfaces are used for building corpora, model transformation and similarity queries. (...) 14 15 """ 17 import logging ---> 19 from gensim import utils, matutils 22 logger = logging.getLogger(> name> ) 25 class CorpusABC(utils.SaveLoad):

File ~/.local/lib/python3.10/site-packages/gensim/matutils.py:1030 1025 return 1. - float(len(set1 & set2)) / float(union_cardinality) 1028 try: 1029 # try to load fast, cythonized code if possible -> 1030 from gensim._matutils import logsumexp, mean_absolute_difference, dirichlet_expectation 1032 except ImportError: 1033 def logsumexp(x):

File ~/.local/lib/python3.10/site-packages/gensim/_matutils.pyx:1, in init gensim._matutils()

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

— Reply to this email directly, > view it on GitHub https://github.com/cltk/cltk/issues/1220> , or > unsubscribe https://github.com/notifications/unsubscribe-auth/AAOE36F4NU7YW5CMHT7BT5DXD3GW5ANCNFSM6AAAAAAXRE4B5Q> . You are receiving this because you are subscribed to this thread.> Message ID: > <cltk/cltk/issues/1220> @> github> .> com>

howdood commented 1 year ago

Thanks so much! To confirm: all that's needed to fix this is the solution suggested in your number 1 - reinstall numpy via

pip install --upgrade numpy

Thank you!