Closed wehooper closed 3 years ago
Hi Wally, thanks for reaching out. Sounds like a terrific project you're working on. Answers are usually more straightforward, but we recently have update to a new major version (0.1 -> 1.0) and so unfortunately you're having to deal with the fallout :)
Do you have an eg requirements.txt that your previous dev made? In the past, if he installed with pip, he would have got eg 0.1.118
however if you did this recently yourself, you would have got an entirely different codebase. If you know the old code version, you can install it with eg pip install cltk==0.1.121
.
My colleague has added the Thomist lemmas, morelemmas, and ourlemmas (additions) following the paradigm in backoff, and that program has been working well for months.
This is awesome and just what we hope for, to allow this kind of customization (eg, for oddball neo-Latin chemists ;) . @diyclassics wrote and maintains this code so looping him in. However before calling on Patrick @wehooper please give a shot at the old codebase. If that fails, we can take the next step.
Hi Kyle, Nice to hear from you. Let me provide some more information about my most recent attempt. I have accounts on different supercomputers in IU's array; all of them currently work with data on a specialized high-speed storage drive that serves the Crays. I have our data sets on that drive.
I am running CLTK's July 9 code base on one machine and an April 8 installation on the newer machine. My July 9 setup still runs our lemmatizing program, lemmas.py, without a problem but my new April 9 fails with the import error I reported. The July 9 installation is using Python 3.7 while the April 9 installation is using Python 3.8.4. We have the ability to specify the versions and of course venv is there to sort things out once activated.
On background, I should clarify that Richard Rufus wasn't interested in alchemy, but played an important role in launching the study of philosophy at the University of Paris in the 1100s, but Newton certainly was. The two projects work separately, their leaders are good friends, and I'm developing tools that serve both efforts.
The Thomist corpus is really appropriate for Rufus.
On our front, I'm willing to provide more information.
Wally
On Sat, Apr 10, 2021 at 10:14 AM Kyle P. Johnson @.***> wrote:
Hi Wally, thanks for reaching out. Sounds like a terrific project you're working on. Answers are usually more straightforward, but we recently have update to a new major version (0.1 -> 1.0) and so unfortunately you're having to deal with the fallout :)
Do you have an eg requirements.txt that your previous dev made? In the past, if he installed with pip, he would have got eg 0.1.118 however if you did this recently yourself, you would have got an entirely different codebase. If you know the old code version, you can install it with eg pip install cltk==0.1.121.
My colleague has added the Thomist lemmas, morelemmas, and ourlemmas (additions) following the paradigm in backoff, and that program has been working well for months.
This is awesome and just what we hope for, to allow this kind of customization (eg, for oddball neo-Latin chemists ;) . @diyclassics https://github.com/diyclassics wrote and maintains this code so looping him in. However before looping him in @wehooper https://github.com/wehooper please give a shot at the old codebase. If that fails, we can take the next step.
- Our 0.1 docs will remain available here: https://legacy.cltk.org/en/latest/latin.html#lemmatization-backoff-method
- 1.0 docs are here: https://docs.cltk.org/en/latest/ (very much in progress)
- And demonstration notebooks of v1.0 here (will have to suffice for tutorials until we find the time/funding to finish them): https://github.com/cltk/cltk/blob/master/notebooks/CLTK%20Demonstration.ipynb
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cltk/cltk/issues/1089#issuecomment-817173217, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMABKPICVQXWVLBE3BPQSLTTICBQZANCNFSM42WBZ66A .
Our new colleague will be using the current April build unless the July version is still available, but it's better to try to keep up.
Thanks again, Wally
On Mon, Apr 12, 2021 at 11:35 AM Wally Hooper @.***> wrote:
Hi Kyle, Nice to hear from you. Let me provide some more information about my most recent attempt. I have accounts on different supercomputers in IU's array; all of them currently work with data on a specialized high-speed storage drive that serves the Crays. I have our data sets on that drive.
I am running CLTK's July 9 code base on one machine and an April 8 installation on the newer machine. My July 9 setup still runs our lemmatizing program, lemmas.py, without a problem but my new April 9 fails with the import error I reported. The July 9 installation is using Python 3.7 while the April 9 installation is using Python 3.8.4. We have the ability to specify the versions and of course venv is there to sort things out once activated.
On background, I should clarify that Richard Rufus wasn't interested in alchemy, but played an important role in launching the study of philosophy at the University of Paris in the 1100s, but Newton certainly was. The two projects work separately, their leaders are good friends, and I'm developing tools that serve both efforts.
The Thomist corpus is really appropriate for Rufus.
On our front, I'm willing to provide more information.
Wally
On Sat, Apr 10, 2021 at 10:14 AM Kyle P. Johnson @.***> wrote:
Hi Wally, thanks for reaching out. Sounds like a terrific project you're working on. Answers are usually more straightforward, but we recently have update to a new major version (0.1 -> 1.0) and so unfortunately you're having to deal with the fallout :)
Do you have an eg requirements.txt that your previous dev made? In the past, if he installed with pip, he would have got eg 0.1.118 however if you did this recently yourself, you would have got an entirely different codebase. If you know the old code version, you can install it with eg pip install cltk==0.1.121.
My colleague has added the Thomist lemmas, morelemmas, and ourlemmas (additions) following the paradigm in backoff, and that program has been working well for months.
This is awesome and just what we hope for, to allow this kind of customization (eg, for oddball neo-Latin chemists ;) . @diyclassics https://github.com/diyclassics wrote and maintains this code so looping him in. However before looping him in @wehooper https://github.com/wehooper please give a shot at the old codebase. If that fails, we can take the next step.
- Our 0.1 docs will remain available here: https://legacy.cltk.org/en/latest/latin.html#lemmatization-backoff-method
- 1.0 docs are here: https://docs.cltk.org/en/latest/ (very much in progress)
- And demonstration notebooks of v1.0 here (will have to suffice for tutorials until we find the time/funding to finish them): https://github.com/cltk/cltk/blob/master/notebooks/CLTK%20Demonstration.ipynb
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cltk/cltk/issues/1089#issuecomment-817173217, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMABKPICVQXWVLBE3BPQSLTTICBQZANCNFSM42WBZ66A .
Hi Wally,
The error you reported is the same as that fixed in #1090 :
_re_non_word_chars = PunktLanguageVars._re_non_word_chars.replace("'", "")
AttributeError: 'property' object has no attribute 'replace'
Concerning this, I don't have enough information about what CLTK versions your builds were:
Our new colleague will be using the current April build unless the July version is still available, but it's better to try to keep up.
Once you developer gets acquainted with the code base, and you still have the problem, have him share here the results of pip list | grep cltk
.
Richard Rufus wasn't interested in alchemy, but played an important role in launching the study of philosophy at the University of Paris in the 1100s, but Newton certainly was
Fascinating. Please stay in touch about your project.
Dear colleagues, thank you for your amazing work.
I am writing to share that I have exactly the same problem with WordTokenizer
for the combination:
Win10/WSL1 + Python3.9 + CLTK 0.1.121 (118 as well)
File "/home/<...>/anaconda3/envs/default/lib/python3.9/site-packages/cltk/tokenize/latin/params.py", line 156, in LatinLanguageVars
_re_non_word_chars = PunktLanguageVars._re_non_word_chars.replace("'",'')
AttributeError: 'property' object has no attribute 'replace'
With 1.1.6, I get
from cltk.tokenize.word import WordTokenizer
ModuleNotFoundError: No module named 'cltk.tokenize'
AFAIU, WSL1 is not a supported platform for CLTK, but -- sharing the symptoms just in case.
Best regards, Anton.
Oops, the problem remains on WSL2 (Ubuntu 22.04) as well.
Rolling back to nltk==3.5 doesn't help.
I have tried the combination of cltk==0.1.121
and nltk==3.5
and this
from cltk.corpus.utils.importer import CorpusImporter
my_latin_downloader = CorpusImporter('latin')
my_latin_downloader.import_corpus('latin_models_cltk')
as suggested in https://github.com/cltk/cltk/issues/1096.
Aaand everyting seems to work now, thanks!
What do I do to be able to use the latest CLTK version?
Best regards, Anton.
@alexeyev We do not support the 0.x versions anymore, but we're glad to hear they still work!
To upgrade to the latest 1.x, you would do pip install -U cltk
but I have to warn you that almost everything in it is different. You can read more here: https://docs.cltk.org/en/latest/quickstart.html
Ah, so it probably means that I have consulted the older docs/examples when designing the tokens normalization pipeline. Thanks again.
Yes, sounds like it. Old docs here: https://legacy.cltk.org/en/latest/ and the more recent at the link above.
If you are just getting started with our tools, I strongly recommend using the latest version as described in the Quickstart url, above. If necessary, there are other Latin tokenizers in the project.
I strongly recommend using the latest version
Yes, I think I'm going to rewrite everything using the latest CLTK stable version API to be able to support our own codebase later. Thank you!
Our team has been using CLTK on administered CRAY academic computers since last July to lemmatize a digital edition of the medieval philosopher Richard Rufus of Cornwall, all Latin.
The team member who led our adoption of CLTK has found a new position, and in anticipation of training a replacement, three days ago, I installed CLTK according to current installation instructions for developers on an account where we had not installed it before.
We have a python script with the informative name lemmas.py but whose opening lines call backoff.py:
My colleague has added the Thomist lemmas, morelemmas, and ourlemmas (additions) following the paradigm in backoff, and that program has been working well for months.
Under the new installation, I see the following:
I tried to trace the opening steps in our lemmas.py program, but the debugging caret dives into cltk libraries immediately after trying to execute our copy of backoff.py, as you can see from the Traceout.
Does this error look familiar? Is this installation instance missing a file? I think all the named files are there but I haven't paid close attention before. Can you advise? It is an administered environment but we are free to use venv and the previous installation of CLTK worked very smoothly.
By the way, we all think CLTK is great, well done.
Thanks, Wally Hooper Chymistry of Isaac Newton Project/Richard Rufus Project Indiana University, Bloomington