allenai / scholarphi

An interactive PDF reader.
Apache License 2.0
416 stars 52 forks source link

pin wandb to 0.12.18 #359

Closed ca16 closed 2 years ago

ca16 commented 2 years ago

When I tried running the tests off the chi-2021-demo branch in a newly built image, I saw errors like this:

...
  tests/test_extract_definitions.py:7: in <module>
      from entities.definitions.commands.detect_definitions import (
  entities/definitions/__init__.py:13: in <module>
      from .commands.detect_definitions import DetectDefinitions
  entities/definitions/commands/detect_definitions.py:16: in <module>
      from ..nlp import DefinitionDetectionModel
  entities/definitions/nlp.py:12: in <module>
      from transformers import (CONFIG_MAPPING, AutoConfig, AutoTokenizer,
  /usr/local/lib/python3.7/dist-packages/transformers/__init__.py:345: in <module>
      from .trainer import Trainer, set_seed, torch_distributed_zero_first, EvalPrediction
  /usr/local/lib/python3.7/dist-packages/transformers/trainer.py:64: in <module>
      import wandb
  /usr/local/lib/python3.7/dist-packages/wandb/__init__.py:37: in <module>
      from wandb import sdk as wandb_sdk
  /usr/local/lib/python3.7/dist-packages/wandb/sdk/__init__.py:12: in <module>
      from .wandb_init import init  # noqa: F401
  /usr/local/lib/python3.7/dist-packages/wandb/sdk/wandb_init.py:28: in <module>
      from .backend.backend import Backend
  /usr/local/lib/python3.7/dist-packages/wandb/sdk/backend/backend.py:14: in <module>
      from ..interface import interface
  /usr/local/lib/python3.7/dist-packages/wandb/sdk/interface/interface.py:17: in <module>
      from wandb.proto import wandb_internal_pb2  # type: ignore
  /usr/local/lib/python3.7/dist-packages/wandb/proto/wandb_internal_pb2.py:37: in <module>
      type=None),
  /usr/local/lib/python3.7/dist-packages/google/protobuf/descriptor.py:755: in __new__
      _message.Message._CheckCalledFromGeneratedFile()
  E   TypeError: Descriptors cannot not be created directly.
  E   If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
  E   If you cannot immediately regenerate your protos, some other possible workarounds are:
  E    1. Downgrade the protobuf package to 3.20.x or lower.
  E    2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
  E
  E   More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
...

I saw similar errors trying to run a paper through a newly built image:

...
Traceback (most recent call last):
  File "scripts/run_pipeline.py", line 32, in <module>
    from entities.definitions.commands.detect_definitions import DetectDefinitions
  File "/data-processing/entities/definitions/__init__.py", line 13, in <module>
    from .commands.detect_definitions import DetectDefinitions
  File "/data-processing/entities/definitions/commands/detect_definitions.py", line 16, in <module>
    from ..nlp import DefinitionDetectionModel
  File "/data-processing/entities/definitions/nlp.py", line 12, in <module>
    from transformers import (CONFIG_MAPPING, AutoConfig, AutoTokenizer,
  File "/usr/local/lib/python3.7/dist-packages/transformers/__init__.py", line 345, in <module>
    from .trainer import Trainer, set_seed, torch_distributed_zero_first, EvalPrediction
  File "/usr/local/lib/python3.7/dist-packages/transformers/trainer.py", line 64, in <module>
    import wandb
  File "/usr/local/lib/python3.7/dist-packages/wandb/__init__.py", line 37, in <module>
    from wandb import sdk as wandb_sdk
  File "/usr/local/lib/python3.7/dist-packages/wandb/sdk/__init__.py", line 12, in <module>
    from .wandb_init import init  # noqa: F401
  File "/usr/local/lib/python3.7/dist-packages/wandb/sdk/wandb_init.py", line 28, in <module>
    from .backend.backend import Backend
  File "/usr/local/lib/python3.7/dist-packages/wandb/sdk/backend/backend.py", line 14, in <module>
    from ..interface import interface
  File "/usr/local/lib/python3.7/dist-packages/wandb/sdk/interface/interface.py", line 17, in <module>
    from wandb.proto import wandb_internal_pb2  # type: ignore
  File "/usr/local/lib/python3.7/dist-packages/wandb/proto/wandb_internal_pb2.py", line 37, in <module>
    type=None),
  File "/usr/local/lib/python3.7/dist-packages/google/protobuf/descriptor.py", line 755, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
...

I think this is related to https://github.com/protocolbuffers/protobuf/issues/10051.

Updating the version wandb is pinned to to 0.12.18 appears to fix things - I think that makes sense given the following line in their changelog:

Require protobuf<4 by @dmitryduev in https://github.com/wandb/client/pull/3709

The tests pass with this change, and I also ran a paper through an image built with this change, and the image we've currently got deployed, and there was no difference when I diffed the output files:

$ diff 1612.04858v1-from-chloea-protobuf-break-pin-wandb-to-0-12-18-2022-06-13-45-adhoc-pp.json 1612.04858v1-from-chi-2021-demo-04-27-2022-pp.json
chloea...$