JohnGiorgi / DeCLUTR

The corresponding code from our paper "DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations". Do not hesitate to open an issue if you run into any trouble!
https://aclanthology.org/2021.acl-long.72/
Apache License 2.0
378 stars 33 forks source link

Error while encoding #260

Closed abis330 closed 1 year ago

abis330 commented 2 years ago

I am running the sample code snippet as shared on your repo as shown below:

from declutr import Encoder

# This can be a path on disk to a model you have trained yourself OR
# the name of one of our pretrained models.
pretrained_model_or_path = "declutr-small"

encoder = Encoder(pretrained_model_or_path)
embeddings = encoder([
    "A smiling costumed woman is holding an umbrella.",
    "A happy woman in a fairy costume holds an umbrella."
])

I get the below error:

File "/Users/some_user/Documents/sample/DeCLUTR/declutr/encoder.py", line 60, in __init__
    common_util.import_module_and_submodules("declutr")
  File "/opt/miniconda3/envs/og_py/lib/python3.8/site-packages/allennlp/common/util.py", line 376, in import_module_and_submodules
    import_module_and_submodules(subpackage, exclude=exclude)
  File "/opt/miniconda3/envs/og_py/lib/python3.8/site-packages/allennlp/common/util.py", line 359, in import_module_and_submodules
    module = importlib.import_module(package_name)
  File "/opt/miniconda3/envs/og_py/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 843, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/Users/some_user/Documents/sample/DeCLUTR/declutr/dataset_reader.py", line 21, in <module>
    class DeCLUTRDatasetReader(DatasetReader):
  File "/Users/some_user/Documents/sample/DeCLUTR/declutr/dataset_reader.py", line 145, in DeCLUTRDatasetReader
    def text_to_instance(self, text: str) -> Instance:  # type: ignore
  File "/opt/miniconda3/envs/og_py/lib/python3.8/site-packages/overrides/overrides.py", line 88, in overrides
    return _overrides(method, check_signature, check_at_runtime)
  File "/opt/miniconda3/envs/og_py/lib/python3.8/site-packages/overrides/overrides.py", line 117, in _overrides
    _validate_method(method, super_class, check_signature)
  File "/opt/miniconda3/envs/og_py/lib/python3.8/site-packages/overrides/overrides.py", line 138, in _validate_method
    ensure_signature_is_compatible(super_method, method, is_static)
  File "/opt/miniconda3/envs/og_py/lib/python3.8/site-packages/overrides/signature.py", line 106, in ensure_signature_is_compatible
    ensure_all_positional_args_defined_in_sub(
  File "/opt/miniconda3/envs/og_py/lib/python3.8/site-packages/overrides/signature.py", line 220, in ensure_all_positional_args_defined_in_sub
    raise TypeError(f"{method_name}: `{super_param.name}` must be present")
TypeError: DeCLUTRDatasetReader.text_to_instance: `inputs` must be present

Please help me resolve this.

JohnGiorgi commented 2 years ago

Thanks for raising an issue, I will try to look into this. Looks like the issue is on the AllenNLP side of things. Unless you really need to use AllenNLP, can you try using it from HuggingFace Transformers instead? Instructions here.

abis330 commented 2 years ago

I am planning to do pre-training on the unlabeled dataset that I have with me. This error is stopping me to do so.

JohnGiorgi commented 2 years ago

TLDR; You likely have installed an unsupported version of AllenNLP. Try installing a supported version and running the code again.


Can you provide more information on your environment, please? E.g. OS, python version, AllenNLP version, the commands you used to install DeCLUTR, etc...

This works fine in the embedding.ipynb notebook (see screenshot below)

image

so I think it's likely there is something up with your environment. Looking at the error, my best guess is that you have installed a newer version of AllenNLP than is supported by this repo. In the version of AllenNLP required by this repo, the text_to_instance method of DatasetReader (which is inherited by our DeCLUTRDatasetReader) accepts an argument text. However, in newer versions of AllenNLP this argument is called inputs. Hence the error you are getting:

TypeError: DeCLUTRDatasetReader.text_to_instance: `inputs` must be present.

You can check the version of AllenNLP you have installed by running

pip freeze | grep "allennlp*"

in your python environment where you have installed this repo. I would try installing a supported version and trying this example again.

JohnGiorgi commented 1 year ago

Closing, please re-open if I didn't solve your issue!