Closed mjpost closed 1 year ago
Can I ask which python version are you using? I've tried 3.8 and 3.9 and I get pickling problems with awesome-align.
3.8.10. I had further problems, however, with coref. I never got this fully to run (and in the meantime have been sidetracked by ACL and summer holidays).
I am having incompatibility issues. Maybe the authors can help. @CoderPat @nightingal3 can you clarify the versions of these modules please: python torch allen-nlp awesome-align spacy
Hello @mjpost and @Wafaa014, sorry about the issues with installation! These are the versions of the libraries in my dev environment:
python 3.9.7 hf930737_3_cpython conda-forge torch 1.9.1 pypi_0 pypi allennlp 2.7.0 pypi_0 pypi awesome-align 0.1.7 pypi_0 pypi spacy 3.1.7 pypi_0 pypi
I've also added my full environment here: https://github.com/CoderPat/MuDA/blob/main/muda_new_req.txt There may be some extraneous libraries, but you can try creating a separate conda env to see if that works.
It works now, thanks 馃憤馃徎
no problem, let us know if you have any further issues!
Hey! Sorry for the late reply @mjpost @Wafaa014, was on vacation. Thanks for dealing with this @nightingal3
Hello, I am also having some problems when running this. I created an environment using the muda_env.yml file. When I test it on a small test document, I do get some tags (namely "lexical_cohesion" and "verb_form"), however the other indicators don't seem to be working.
I'd be grateful for your thoughts on this!
This is the command I used: PYTHONPATH=/home/getalp/nakhlem/MuDA python muda/main.py --src my_data/text.en --tgt my_data/text.es --docids my_data/text.docids --dump-tags my_data/test_enes_muda-env-yaml.tags --tgt-lang "es"
And this is the full message:
2024-01-10 16:53:32 INFO: Checking for updates to resources.json in case models have been updated. Note: this behavior can be turned off with download_method=None or download_method=DownloadMethod.REUSE_RESOURCES
Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.6.0.json: 367kB [00:00, 22.0MB/s]
2024-01-10 16:53:33 INFO: Loading these models for language: en (English):
=================================
| Processor | Package |
---------------------------------
| tokenize | combined |
| pos | combined_charlm |
| lemma | combined_nocharlm |
| depparse | combined_charlm |
=================================
2024-01-10 16:53:33 INFO: Using device: cuda
2024-01-10 16:53:33 INFO: Loading: tokenize
2024-01-10 16:53:35 INFO: Loading: pos
2024-01-10 16:53:36 INFO: Loading: lemma
2024-01-10 16:53:36 INFO: Loading: depparse
2024-01-10 16:53:36 INFO: Done loading processors!
2024-01-10 16:53:36 INFO: Checking for updates to resources.json in case models have been updated. Note: this behavior can be turned off with download_method=None or download_method=DownloadMethod.REUSE_RESOURCES
Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.6.0.json: 367kB [00:00, 21.0MB/s]
2024-01-10 16:53:37 WARNING: Language es package default expects mwt, which has been added
2024-01-10 16:53:38 INFO: Loading these models for language: es (Spanish):
===============================
| Processor | Package |
-------------------------------
| tokenize | ancora |
| mwt | ancora |
| pos | ancora_charlm |
| lemma | ancora_nocharlm |
| depparse | ancora_charlm |
===============================
2024-01-10 16:53:38 INFO: Using device: cuda
2024-01-10 16:53:38 INFO: Loading: tokenize
2024-01-10 16:53:38 INFO: Loading: mwt
2024-01-10 16:53:38 INFO: Loading: pos
2024-01-10 16:53:38 INFO: Loading: lemma
2024-01-10 16:53:38 INFO: Loading: depparse
2024-01-10 16:53:38 INFO: Done loading processors!
/home/getalp/nakhlem/miniconda3/envs/muda_yml/lib/python3.9/site-packages/spacy/language.py:1580: UserWarning: Due to multiword token expansion or an alignment issue, the original text has been replaced by space-separated expanded tokens.
docs = (self._ensure_doc(text) for text in texts)
/home/getalp/nakhlem/miniconda3/envs/muda_yml/lib/python3.9/site-packages/spacy/language.py:1580: UserWarning: Can't set named entities because of multi-word token expansion or because the character offsets don't map to valid tokens produced by the Stanza tokenizer:
Words: ['Varios', 'enmascarados', 'irrumpieron', 'en', 'el', 'estudio', 'de', 'el', 'canal', 'p煤blico', 'TC', 'durante', 'una', 'emisi贸n', ',', 'obligando', 'a', 'el', 'personal', 'a', 'tirar', 'se', 'a', 'el', 'suelo', '.']
Entities: []
docs = (self._ensure_doc(text) for text in texts)
/home/getalp/nakhlem/miniconda3/envs/muda_yml/lib/python3.9/site-packages/spacy/language.py:1580: UserWarning: Can't set named entities because of multi-word token expansion or because the character offsets don't map to valid tokens produced by the Stanza tokenizer:
Words: ['A', 'el', 'menos', '10', 'personas', 'han', 'muerto', 'desde', 'que', 'el', 'lunes', 'se', 'declarara', 'el', 'estado', 'de', 'excepci贸n', 'en', 'Ecuador', '.']
Entities: []
docs = (self._ensure_doc(text) for text in texts)
/home/getalp/nakhlem/miniconda3/envs/muda_yml/lib/python3.9/site-packages/spacy/language.py:1580: UserWarning: Can't set named entities because of multi-word token expansion or because the character offsets don't map to valid tokens produced by the Stanza tokenizer:
Words: ['Este', 'se', 'declar贸', 'despu茅s', 'de', 'que', 'un', 'famoso', 'g谩nster', 'desapareciera', 'de', 'su', 'celda', 'en', 'prisi贸n', '.', 'No', 'est谩', 'claro', 'si', 'el', 'incidente', 'en', 'el', 'estudio', 'de', 'televisi贸n', 'de', 'Guayaquil', 'est谩', 'relacionado', 'con', 'la', 'desaparici贸n', 'de', 'una', 'prisi贸n', 'de', 'la', 'misma', 'ciudad', 'de', 'el', 'jefe', 'de', 'la', 'banda', 'de', 'los', 'Choneros', ',', 'Adolfo', 'Mac铆as', 'Villamar', ',', 'o', 'Fito', ',', 'como', 'es', 'm谩s', 'conocido', '.']
Entities: []
docs = (self._ensure_doc(text) for text in texts)
/home/getalp/nakhlem/miniconda3/envs/muda_yml/lib/python3.9/site-packages/spacy/language.py:1580: UserWarning: Can't set named entities because of multi-word token expansion or because the character offsets don't map to valid tokens produced by the Stanza tokenizer:
Words: ['En', 'el', 'vecino', 'Per煤', ',', 'el', 'gobierno', 'orden贸', 'el', 'despliegue', 'inmediato', 'de', 'una', 'fuerza', 'policial', 'en', 'la', 'frontera', 'para', 'evitar', 'que', 'la', 'inestabilidad', 'se', 'extienda', 'a', 'el', 'pa铆s', '.']
Entities: []
docs = (self._ensure_doc(text) for text in texts)
/home/getalp/nakhlem/miniconda3/envs/muda_yml/lib/python3.9/site-packages/spacy/language.py:1580: UserWarning: Can't set named entities because of multi-word token expansion or because the character offsets don't map to valid tokens produced by the Stanza tokenizer:
Words: ['Ecuador', 'es', 'uno', 'de', 'los', 'principales', 'exportadores', 'de', 'pl谩tano', 'de', 'el', 'mundo', ',', 'pero', 'tambi茅n', 'exporta', 'petr贸leo', ',', 'caf茅', ',', 'cacao', ',', 'camarones', 'y', 'productos', 'pesqueros', '.', 'El', 'aumento', 'de', 'la', 'violencia', 'en', 'el', 'pa铆s', 'andino', ',', 'dentro', 'y', 'fuera', 'de', 'sus', 'prisiones', ',', 'se', 'ha', 'vinculado', 'a', 'los', 'enfrentamientos', 'entre', 'c谩rteles', 'de', 'la', 'droga', ',', 'tanto', 'extranjeros', 'como', 'locales', ',', 'por', 'el', 'control', 'de', 'las', 'rutas', 'de', 'la', 'coca铆na', 'hacia', 'Estados', 'Unidos', 'y', 'Europa', '.']
Entities: []
docs = (self._ensure_doc(text) for text in texts)
Loading the dataset...
Extracting: 9it [00:00, 23.15it/s]
Some weights of BertModel were not initialized from the model checkpoint at SpanBERT/spanbert-large-cased and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
coref error
coref error
Hey @MariamNakhle Sorry for the late reply, I was on vacation for the last week.
Weird... Have you manually checked if any of input files lines actually has any of those phenomena (with the exception of ellipsis, which is a bit more tricky)?
Also can you please create a new issue to avoid re-opening this one (as it does not seem to be a installation issue)
I had a lot of issues with installation, noting them here in case you want to fix them, or others have similar ones.
python3 -m venv muda; . muda/bin/activate
muda/
. I installed both, starting with the one undermuda
.overrides==3.1.0
to fix one set of errorspydantic==1.7.4
to fix another set of errorsAfter that, I was able to get the program to run.