Rostlab / SeqVec

Modelling the Language of Life - Deep Learning Protein Sequences
http://embed.protein.properties
MIT License
116 stars 13 forks source link

Issue running seqvec on example file in Google Colab #20

Open tijeco opened 3 years ago

tijeco commented 3 years ago

I've tried getting seqvec to install on google colab, but have encountered some errors.

I run the following command

!pip install seqvec
ERROR: en-core-web-sm 2.2.5 has requirement spacy>=2.2.2, but you'll have spacy 2.1.9 which is incompatible.
ERROR: botocore 1.20.70 has requirement urllib3<1.27,>=1.25.4, but you'll have urllib3 1.24.3 which is incompatible.
ERROR: responses 0.13.3 has requirement urllib3>=1.25.10, but you'll have urllib3 1.24.3 which is incompatible.

At the end it appears to have finished installing?? An gave this final message

Successfully installed allennlp-0.9.0 blis-0.2.4 boto3-1.17.70 botocore-1.20.70 conllu-1.3.1 flaky-3.7.0 flask-cors-3.0.10 ftfy-6.0.1 gevent-1.4.0 jmespath-0.10.0 jsonnet-0.17.0 jsonpickle-2.0.0 numpydoc-1.1.0 overrides-6.1.0 parsimonious-0.8.1 plac-0.9.6 preshed-2.0.1 pytorch-pretrained-bert-0.6.2 pytorch-transformers-1.1.0 responses-0.13.3 s3transfer-0.4.2 sentencepiece-0.1.95 seqvec-0.4.1 spacy-2.1.9 tensorboardX-2.2 thinc-7.0.8 typing-utils-0.0.3 unidecode-1.2.0 word2number-1.1

I downloaded the test fasta file with this command:

!wget https://raw.githubusercontent.com/Rostlab/SeqVec/master/test-data/sequences.fasta

Then, I ran the follwoing command with seqvec:

!seqvec -i sequences.fasta -o embeddings.npz

And I get all of these glorious errors

/usr/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: greenlet.greenlet size changed, may indicate binary incompatibility. Expected 144 from C header, got 152 from PyObject
  return f(*args, **kwds)
/usr/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: greenlet.greenlet size changed, may indicate binary incompatibility. Expected 144 from C header, got 152 from PyObject
  return f(*args, **kwds)
/usr/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: greenlet.greenlet size changed, may indicate binary incompatibility. Expected 144 from C header, got 152 from PyObject
  return f(*args, **kwds)
/usr/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: greenlet.greenlet size changed, may indicate binary incompatibility. Expected 144 from C header, got 152 from PyObject
  return f(*args, **kwds)
/usr/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: greenlet.greenlet size changed, may indicate binary incompatibility. Expected 144 from C header, got 152 from PyObject
  return f(*args, **kwds)
/usr/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: greenlet.greenlet size changed, may indicate binary incompatibility. Expected 144 from C header, got 152 from PyObject
  return f(*args, **kwds)
Traceback (most recent call last):
  File "/usr/local/bin/seqvec", line 5, in <module>
    from seqvec.seqvec import main
  File "/usr/local/lib/python3.7/dist-packages/seqvec/seqvec.py", line 15, in <module>
    from allennlp.commands.elmo import ElmoEmbedder
  File "/usr/local/lib/python3.7/dist-packages/allennlp/commands/__init__.py", line 8, in <module>
    from allennlp.commands.configure import Configure
  File "/usr/local/lib/python3.7/dist-packages/allennlp/commands/configure.py", line 26, in <module>
    from allennlp.service.config_explorer import make_app
  File "/usr/local/lib/python3.7/dist-packages/allennlp/service/config_explorer.py", line 24, in <module>
    from allennlp.common.configuration import configure, choices
  File "/usr/local/lib/python3.7/dist-packages/allennlp/common/configuration.py", line 17, in <module>
    from allennlp.data.dataset_readers import DatasetReader
  File "/usr/local/lib/python3.7/dist-packages/allennlp/data/__init__.py", line 1, in <module>
    from allennlp.data.dataset_readers.dataset_reader import DatasetReader
  File "/usr/local/lib/python3.7/dist-packages/allennlp/data/dataset_readers/__init__.py", line 10, in <module>
    from allennlp.data.dataset_readers.ccgbank import CcgBankDatasetReader
  File "/usr/local/lib/python3.7/dist-packages/allennlp/data/dataset_readers/ccgbank.py", line 9, in <module>
    from allennlp.data.dataset_readers.dataset_reader import DatasetReader
  File "/usr/local/lib/python3.7/dist-packages/allennlp/data/dataset_readers/dataset_reader.py", line 8, in <module>
    from allennlp.data.instance import Instance
  File "/usr/local/lib/python3.7/dist-packages/allennlp/data/instance.py", line 3, in <module>
    from allennlp.data.fields.field import DataArray, Field
  File "/usr/local/lib/python3.7/dist-packages/allennlp/data/fields/__init__.py", line 7, in <module>
    from allennlp.data.fields.array_field import ArrayField
  File "/usr/local/lib/python3.7/dist-packages/allennlp/data/fields/array_field.py", line 10, in <module>
    class ArrayField(Field[numpy.ndarray]):
  File "/usr/local/lib/python3.7/dist-packages/allennlp/data/fields/array_field.py", line 50, in ArrayField
    @overrides
  File "/usr/local/lib/python3.7/dist-packages/overrides/overrides.py", line 88, in overrides
    return _overrides(method, check_signature, check_at_runtime)
  File "/usr/local/lib/python3.7/dist-packages/overrides/overrides.py", line 114, in _overrides
    _validate_method(method, super_class, check_signature)
  File "/usr/local/lib/python3.7/dist-packages/overrides/overrides.py", line 135, in _validate_method
    ensure_signature_is_compatible(super_method, method, is_static)
  File "/usr/local/lib/python3.7/dist-packages/overrides/signature.py", line 93, in ensure_signature_is_compatible
    ensure_return_type_compatibility(super_type_hints, sub_type_hints, method_name)
  File "/usr/local/lib/python3.7/dist-packages/overrides/signature.py", line 288, in ensure_return_type_compatibility
    f"{method_name}: return type `{sub_return}` is not a `{super_return}`."
TypeError: ArrayField.empty_field: return type `None` is not a `<class 'allennlp.data.fields.field.Field'>`.

Any thoughts on how I can get it working in Google colab?

Here is a link to the notebook with the various things I have tried already to solve this. https://colab.research.google.com/drive/1KB6KYB20LXwvV2wnj8rdcV-x_fxXYYO9?usp=sharing

konstin commented 3 years ago

Colab ships an old version of pip, so you need to run !pip install -U pip first, which will make pip correctly resolve the right versions.

I haven't seen the error you have before, but this could very well be due a wrong library version that pip installed, so !pip install -U pip might fix this,

I'd also like to advertise https://github.com/sacdallago/bio_embeddings, where we've integrated SeqVec and also have examples on how to run on google colab (e.g. this one from here).

tijeco commented 3 years ago

@konstin Thanks for the quick response! I figured there was some sort of colab-specific pip thing that may be causing issues, so thanks for pointing that out. bioembeddings was next on my list to try to get working so I'll be sure to check it out as well!

tijeco commented 3 years ago

@konstin I factory reset the colab notebook and followed your suggestion by running the following

!pip install -U pip
!pip install seqvec

It gave the following errors

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
en-core-web-sm 2.2.5 requires spacy>=2.2.2, but you have spacy 2.1.9 which is incompatible.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.

But still installed the following

Successfully installed allennlp-0.9.0 blis-0.2.4 boto3-1.17.70 botocore-1.20.70 conllu-1.3.1 flaky-3.7.0 flask-cors-3.0.10 ftfy-6.0.1 gevent-1.4.0 jmespath-0.10.0 jsonnet-0.17.0 jsonpickle-2.0.0 numpydoc-1.1.0 overrides-6.1.0 parsimonious-0.8.1 plac-0.9.6 preshed-2.0.1 pytorch-pretrained-bert-0.6.2 pytorch-transformers-1.1.0 responses-0.13.3 s3transfer-0.4.2 sentencepiece-0.1.95 seqvec-0.4.1 spacy-2.1.9 tensorboardX-2.2 thinc-7.0.8 typing-utils-0.0.3 unidecode-1.2.0 urllib3-1.25.11 word2number-1.1

With the following warning

WARNING: Running pip as root will break packages and permissions. You should install packages reliably by using venv: https://pip.pypa.io/warnings/venv

I then tried to run the example:

!wget https://raw.githubusercontent.com/Rostlab/SeqVec/master/test-data/sequences.fasta
!seqvec -i sequences.fasta -o embeddings.npz

And got all these glorious errors, which look basically the same as before.

/usr/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: greenlet.greenlet size changed, may indicate binary incompatibility. Expected 144 from C header, got 152 from PyObject
  return f(*args, **kwds)
/usr/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: greenlet.greenlet size changed, may indicate binary incompatibility. Expected 144 from C header, got 152 from PyObject
  return f(*args, **kwds)
/usr/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: greenlet.greenlet size changed, may indicate binary incompatibility. Expected 144 from C header, got 152 from PyObject
  return f(*args, **kwds)
/usr/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: greenlet.greenlet size changed, may indicate binary incompatibility. Expected 144 from C header, got 152 from PyObject
  return f(*args, **kwds)
/usr/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: greenlet.greenlet size changed, may indicate binary incompatibility. Expected 144 from C header, got 152 from PyObject
  return f(*args, **kwds)
/usr/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: greenlet.greenlet size changed, may indicate binary incompatibility. Expected 144 from C header, got 152 from PyObject
  return f(*args, **kwds)
Traceback (most recent call last):
  File "/usr/local/bin/seqvec", line 5, in <module>
    from seqvec.seqvec import main
  File "/usr/local/lib/python3.7/dist-packages/seqvec/seqvec.py", line 15, in <module>
    from allennlp.commands.elmo import ElmoEmbedder
  File "/usr/local/lib/python3.7/dist-packages/allennlp/commands/__init__.py", line 8, in <module>
    from allennlp.commands.configure import Configure
  File "/usr/local/lib/python3.7/dist-packages/allennlp/commands/configure.py", line 26, in <module>
    from allennlp.service.config_explorer import make_app
  File "/usr/local/lib/python3.7/dist-packages/allennlp/service/config_explorer.py", line 24, in <module>
    from allennlp.common.configuration import configure, choices
  File "/usr/local/lib/python3.7/dist-packages/allennlp/common/configuration.py", line 17, in <module>
    from allennlp.data.dataset_readers import DatasetReader
  File "/usr/local/lib/python3.7/dist-packages/allennlp/data/__init__.py", line 1, in <module>
    from allennlp.data.dataset_readers.dataset_reader import DatasetReader
  File "/usr/local/lib/python3.7/dist-packages/allennlp/data/dataset_readers/__init__.py", line 10, in <module>
    from allennlp.data.dataset_readers.ccgbank import CcgBankDatasetReader
  File "/usr/local/lib/python3.7/dist-packages/allennlp/data/dataset_readers/ccgbank.py", line 9, in <module>
    from allennlp.data.dataset_readers.dataset_reader import DatasetReader
  File "/usr/local/lib/python3.7/dist-packages/allennlp/data/dataset_readers/dataset_reader.py", line 8, in <module>
    from allennlp.data.instance import Instance
  File "/usr/local/lib/python3.7/dist-packages/allennlp/data/instance.py", line 3, in <module>
    from allennlp.data.fields.field import DataArray, Field
  File "/usr/local/lib/python3.7/dist-packages/allennlp/data/fields/__init__.py", line 7, in <module>
    from allennlp.data.fields.array_field import ArrayField
  File "/usr/local/lib/python3.7/dist-packages/allennlp/data/fields/array_field.py", line 10, in <module>
    class ArrayField(Field[numpy.ndarray]):
  File "/usr/local/lib/python3.7/dist-packages/allennlp/data/fields/array_field.py", line 50, in ArrayField
    @overrides
  File "/usr/local/lib/python3.7/dist-packages/overrides/overrides.py", line 88, in overrides
    return _overrides(method, check_signature, check_at_runtime)
  File "/usr/local/lib/python3.7/dist-packages/overrides/overrides.py", line 114, in _overrides
    _validate_method(method, super_class, check_signature)
  File "/usr/local/lib/python3.7/dist-packages/overrides/overrides.py", line 135, in _validate_method
    ensure_signature_is_compatible(super_method, method, is_static)
  File "/usr/local/lib/python3.7/dist-packages/overrides/signature.py", line 93, in ensure_signature_is_compatible
    ensure_return_type_compatibility(super_type_hints, sub_type_hints, method_name)
  File "/usr/local/lib/python3.7/dist-packages/overrides/signature.py", line 288, in ensure_return_type_compatibility
    f"{method_name}: return type `{sub_return}` is not a `{super_return}`."
TypeError: ArrayField.empty_field: return type `None` is not a `<class 'allennlp.data.fields.field.Field'>`.
DeepColin commented 2 years ago

I have encountered the same error

girum89 commented 2 years ago

I had the same issue. A similar issue is mentioned in issues/5203. The solution is to downgrade overrides library which is mentioned here --ssues/5197.

You should do pip install overrides==3.1.0 to your python 3.7 environment.

ursueugen commented 2 years ago

I had the same issue. I created new conda environment, installed python=3.7 and seqvec, and downgraded overrides to 3.1.0 as clarified by @girum89.

conda create -n seqvec python=3.7 conda activate seqvec pip install seqvec pip install overrides==3.1.0

I guess the issue can be closed.

konstin commented 2 years ago

I'm grateful for y'all's interest in SeqVec, but I'd like to again point you to bio_embeddings, where I'm actually going to fix those kind of problems