ad-freiburg / GENRE

Autoregressive Entity Retrieval
Other
4 stars 2 forks source link

Using deprecated numpy attribute? #1

Open agolo-alan-hogue opened 1 year ago

agolo-alan-hogue commented 1 year ago

Hello,

I got GENRE set up from your repo, but when I try to run it, I get this:

root@dec09fef21fa:/GENRE# python3 main.py --yago -i agolo-110823.benchmark.jsonl  -o out.jsonl --split_iter --mention_trie data/mention_trie.pkl  --mention_to_candidates_dict data/mention_to_candidates_dict.pkl
Traceback (most recent call last):
  File "main.py", line 3, in <module>
    from model import Model
  File "/GENRE/model.py", line 6, in <module>
    from genre.fairseq_model import GENRE
  File "/GENRE/genre/fairseq_model.py", line 14, in <module>
    from fairseq import search, utils
  File "/GENRE/fairseq/fairseq/utils.py", line 20, in <module>
    from fairseq.modules.multihead_attention import MultiheadAttention
  File "/GENRE/fairseq/fairseq/modules/__init__.py", line 10, in <module>
    from .character_token_embedder import CharacterTokenEmbedder
  File "/GENRE/fairseq/fairseq/modules/character_token_embedder.py", line 11, in <module>
    from fairseq.data import Dictionary
  File "/GENRE/fairseq/fairseq/data/__init__.py", line 23, in <module>
    from .indexed_dataset import (
  File "/GENRE/fairseq/fairseq/data/indexed_dataset.py", line 112, in <module>
    6: np.float,
  File "/usr/local/lib/python3.8/site-packages/numpy/__init__.py", line 305, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'float'.
`np.float` was a deprecated alias for the builtin `float`. To avoid this error in existing code, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

I manually replaced np.float with just float, which happens only in fairseq.

That gives another error. I'll paste that one below. So I went to the normal GENRE repo and found this note regarding fairseq here:

fairseq>=0.10 (optional for training GENRE) NOTE: fairseq is going though changing without backward compatibility. Install fairseq from source and use [this](https://github.com/nicola-decao/fairseq/tree/fixing_prefix_allowed_tokens_fn) commit for reproducibilty. See https://github.com/pytorch/fairseq/pull/3276 for the current PR that should fix fairseq/master.

So it sounds to me like if you were to pull or rebase this PR maybe this would fix these problems?

Thanks for your help!

Latest error:

root@3e9ee535d1b0:/GENRE# python3 main.py --yago -i example_article.jsonl  -o out.jsonl --split_iter --mention_trie data/mention_trie.pkl  --mention_to_candidates_dict data/mention_to_candidates_dict.pkl
Traceback (most recent call last):
  File "main.py", line 3, in <module>
    from model import Model
  File "/GENRE/model.py", line 6, in <module>
    from genre.fairseq_model import GENRE
  File "/GENRE/genre/fairseq_model.py", line 15, in <module>
    from fairseq.models.bart import BARTHubInterface, BARTModel
  File "/GENRE/fairseq/fairseq/models/__init__.py", line 208, in <module>
    module = importlib.import_module("fairseq.models." + model_name)
  File "/usr/local/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/GENRE/fairseq/fairseq/models/wav2vec/__init__.py", line 6, in <module>
    from .wav2vec import *  # noqa
  File "/GENRE/fairseq/fairseq/models/wav2vec/wav2vec.py", line 25, in <module>
    from fairseq.tasks import FairseqTask
  File "/GENRE/fairseq/fairseq/tasks/__init__.py", line 15, in <module>
    from .fairseq_task import FairseqTask, LegacyFairseqTask  # noqa
  File "/GENRE/fairseq/fairseq/tasks/fairseq_task.py", line 13, in <module>
    from fairseq import metrics, search, tokenizer, utils
ImportError: cannot import name 'metrics' from 'fairseq' (unknown location)
agolo-alan-hogue commented 1 year ago

I learned that this is a change in numpy 1.24, so I tried downgrading it (instead of editing things as above), which yields an apparently unrelated error:

root@dbf92a2ec323:/GENRE# pip install --upgrade numpy==1.23.5
Collecting numpy==1.23.5
  Downloading numpy-1.23.5-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (14.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.0/14.0 MB 68.0 MB/s eta 0:00:00
Installing collected packages: numpy
  Attempting uninstall: numpy
    Found existing installation: numpy 1.24.4
    Uninstalling numpy-1.24.4:
      Successfully uninstalled numpy-1.24.4
Successfully installed numpy-1.23.5
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

[notice] A new release of pip is available: 23.0.1 -> 23.3.1
[notice] To update, run: pip install --upgrade pip
root@dbf92a2ec323:/GENRE# pip show numpy
Name: numpy
Version: 1.23.5
Summary: NumPy is the fundamental package for array computing with Python.
Home-page: https://www.numpy.org
Author: Travis E. Oliphant et al.
Author-email:
License: BSD
Location: /usr/local/lib/python3.8/site-packages
Requires:
Required-by: blis, fairseq, sacrebleu, spacy, thinc
root@dbf92a2ec323:/GENRE# python3 main.py --yago -i example_article.jsonl \
 -o out.jsonl --split_iter --mention_trie data/mention_trie.pkl \
 --mention_to_candidates_dict data/mention_to_candidates_dict.pkl
Traceback (most recent call last):
  File "main.py", line 3, in <module>
    from model import Model
  File "/GENRE/model.py", line 6, in <module>
    from genre.fairseq_model import GENRE
  File "/GENRE/genre/fairseq_model.py", line 15, in <module>
    from fairseq.models.bart import BARTHubInterface, BARTModel
  File "/GENRE/fairseq/fairseq/models/__init__.py", line 208, in <module>
    module = importlib.import_module("fairseq.models." + model_name)
  File "/usr/local/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/GENRE/fairseq/fairseq/models/wav2vec/__init__.py", line 6, in <module>
    from .wav2vec import *  # noqa
  File "/GENRE/fairseq/fairseq/models/wav2vec/wav2vec.py", line 25, in <module>
    from fairseq.tasks import FairseqTask
  File "/GENRE/fairseq/fairseq/tasks/__init__.py", line 15, in <module>
    from .fairseq_task import FairseqTask, LegacyFairseqTask  # noqa
  File "/GENRE/fairseq/fairseq/tasks/fairseq_task.py", line 13, in <module>
    from fairseq import metrics, search, tokenizer, utils
ImportError: cannot import name 'metrics' from 'fairseq' (unknown location)
agolo-alan-hogue commented 1 year ago

...and finally tried this:

export PYTHONPATH=$PYTHONPATH:/GENRE/fairseq

It seems like it is going to work, but:

root@dbf92a2ec323:/GENRE# export PYTHONPATH=$PYTHONPATH:/GENRE/fairseq
root@dbf92a2ec323:/GENRE# python3 main.py --yago -i example_article.jsonl  -o out.jsonl --split_iter --mention_trie data/mention_trie.pkl  --mention_to_candidates_dict data/mention_to_candidates_dict.pkl
load model...
1042301B [00:00, 10179472.72B/s]
456318B [00:00, 7585473.82B/s]
load data/mention_trie.pkl...
Killed

I should have enough disk space and memory so I am not sure yet what is causing this. Looks like the file is half a gb. I have 16gb of ram.

agolo-alan-hogue commented 12 months ago

Hello,

I set everything up on an ubuntu machine with docker. Followed all instructions, everything went fine. At linking time I get the same error, shown below.

sudo docker run --rm -v $PWD/data:/GENRE/data  -v $PWD/models:/GENRE/models -it genre bash
root@4646a2f8bd9c:/GENRE# python3 main.py --yago -i data/agolo_v2_beta.benchmark.jsonl \
 -o genre-agolo-linked-articles.jsonl --split_iter --mention_trie data/mention_trie.pkl \
 --mention_to_candidates_dict data/mention_to_candidates_dict.pkl
Traceback (most recent call last):
  File "main.py", line 3, in <module>
    from model import Model
  File "/GENRE/model.py", line 6, in <module>
    from genre.fairseq_model import GENRE
  File "/GENRE/genre/fairseq_model.py", line 14, in <module>
    from fairseq import search, utils
  File "/GENRE/fairseq/fairseq/utils.py", line 20, in <module>
    from fairseq.modules.multihead_attention import MultiheadAttention
  File "/GENRE/fairseq/fairseq/modules/__init__.py", line 10, in <module>
    from .character_token_embedder import CharacterTokenEmbedder
  File "/GENRE/fairseq/fairseq/modules/character_token_embedder.py", line 11, in <module>
    from fairseq.data import Dictionary
  File "/GENRE/fairseq/fairseq/data/__init__.py", line 23, in <module>
    from .indexed_dataset import (
  File "/GENRE/fairseq/fairseq/data/indexed_dataset.py", line 112, in <module>
    6: np.float,
  File "/usr/local/lib/python3.8/site-packages/numpy/__init__.py", line 305, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'float'.
`np.float` was a deprecated alias for the builtin `float`. To avoid this error in existing code, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
agolo-alan-hogue commented 12 months ago

Under the same circumstances, downgrading numpy gives again the same results:

# python -m pip install numpy==1.19.5
Collecting numpy==1.19.5
  Downloading numpy-1.19.5-cp38-cp38-manylinux2010_x86_64.whl (14.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.9/14.9 MB 13.5 MB/s eta 0:00:00
Installing collected packages: numpy
  Attempting uninstall: numpy
    Found existing installation: numpy 1.24.4
    Uninstalling numpy-1.24.4:
      Successfully uninstalled numpy-1.24.4
Successfully installed numpy-1.19.5
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

[notice] A new release of pip is available: 23.0.1 -> 23.3.1
[notice] To update, run: pip install --upgrade pip
root@4646a2f8bd9c:/GENRE# python3 main.py --yago -i data/agolo_v2_beta.benchmark.jsonl  -o genre-agolo-linked-articles.jsonl --split_iter --mention_trie data/mention_trie.pkl  --mention_to_candidates_dict data/mention_to_candidates_dict.pkl
Traceback (most recent call last):
  File "main.py", line 3, in <module>
    from model import Model
  File "/GENRE/model.py", line 6, in <module>
    from genre.fairseq_model import GENRE
  File "/GENRE/genre/fairseq_model.py", line 15, in <module>
    from fairseq.models.bart import BARTHubInterface, BARTModel
  File "/GENRE/fairseq/fairseq/models/__init__.py", line 208, in <module>
    module = importlib.import_module("fairseq.models." + model_name)
  File "/usr/local/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/GENRE/fairseq/fairseq/models/wav2vec/__init__.py", line 6, in <module>
    from .wav2vec import *  # noqa
  File "/GENRE/fairseq/fairseq/models/wav2vec/wav2vec.py", line 25, in <module>
    from fairseq.tasks import FairseqTask
  File "/GENRE/fairseq/fairseq/tasks/__init__.py", line 15, in <module>
    from .fairseq_task import FairseqTask, LegacyFairseqTask  # noqa
  File "/GENRE/fairseq/fairseq/tasks/fairseq_task.py", line 13, in <module>
    from fairseq import metrics, search, tokenizer, utils
ImportError: cannot import name 'metrics' from 'fairseq' (unknown location)
agolo-alan-hogue commented 12 months ago

And again the same thing, this time on an ubuntu machine with twice as much ram:

# python3 main.py --yago -i data/agolo_v2_beta.benchmark.jsonl  -o genre-agolo-linked-articles.jsonl --split_iter --mention_trie data/mention_trie.pkl  --mention_to_candidates_dict data/mention_to_candidates_dict.pkl
load model...
1042301B [00:00, 27201737.57B/s]
456318B [00:00, 13431321.23B/s]
load data/mention_trie.pkl...
Killed
agolo-alan-hogue commented 12 months ago

Same with alternate data:

# python3 main.py --yago -i data/agolo_v2_beta.benchmark.jsonl  -o genre-agolo-linked-articles.jsonl --split_iter --mention_trie data/mention_trie.dalab.pkl  --mention_to_candidates_dict data/mention_to_candidates_dict.dalab.pkl
load model...
load data/mention_trie.dalab.pkl...
load data/mention_to_candidates_dict.dalab.pkl...
Killed
flackbash commented 3 months ago

Hi, I had the same numpy problems (originating from fairseq) which you described. For me everything is working now with python 3.8.12 when I run the following command for the installation of the requirements:

torch pytest requests spacy gdown fairseq

I assume there is no need anymore to install the fairseq clone from Nicola de Cao since this PR was merged in the official repository: https://github.com/facebookresearch/fairseq/pull/3276

Regarding RAM requirements: Loading the model and tries for me takes around 20GB.