OpenAdaptAI / OpenAdapt

Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models
https://www.OpenAdapt.AI
MIT License
977 stars 134 forks source link

[Bug]: pytest fails on new install #295

Open abrichr opened 1 year ago

abrichr commented 1 year ago

Describe the bug

(openadapt-py3.10) abrichr@MacBook-Pro-3 OpenAdapt % pytest
=================================================================== test session starts ====================================================================
platform darwin -- Python 3.10.11, pytest-7.1.3, pluggy-1.0.0
rootdir: /Users/abrichr/oa/OpenAdapt
plugins: anyio-3.7.0
collected 23 items / 1 error                                                                                                                               

========================================================================== ERRORS ==========================================================================
__________________________________________________ ERROR collecting tests/openadapt/test_summary_mixin.py __________________________________________________
tests/openadapt/test_summary_mixin.py:10: in <module>
    REPLAY = DemoReplayStrategy(RECORDING)
openadapt/strategies/demo.py:39: in __init__
    super().__init__(recording)
openadapt/strategies/mixins/huggingface.py:30: in __init__
    super().__init__(recording)
openadapt/strategies/mixins/ocr.py:35: in __init__
    super().__init__(recording)
openadapt/strategies/mixins/ascii.py:28: in __init__
    super().__init__(recording)
openadapt/strategies/mixins/sam.py:48: in __init__
    self.sam_model = self._initialize_model(model_name, checkpoint_dir_path)
openadapt/strategies/mixins/sam.py:60: in _initialize_model
    return sam_model_registry[model_name](checkpoint=checkpoint_file_path)
../../Library/Caches/pypoetry/virtualenvs/openadapt-VBXg4jpm-py3.10/lib/python3.10/site-packages/segment_anything/build_sam.py:15: in build_sam_vit_h
    return _build_sam(
../../Library/Caches/pypoetry/virtualenvs/openadapt-VBXg4jpm-py3.10/lib/python3.10/site-packages/segment_anything/build_sam.py:105: in _build_sam
    state_dict = torch.load(f)
../../Library/Caches/pypoetry/virtualenvs/openadapt-VBXg4jpm-py3.10/lib/python3.10/site-packages/torch/serialization.py:797: in load
    with _open_zipfile_reader(opened_file) as opened_zipfile:
../../Library/Caches/pypoetry/virtualenvs/openadapt-VBXg4jpm-py3.10/lib/python3.10/site-packages/torch/serialization.py:283: in __init__
    super().__init__(torch._C.PyTorchFileReader(name_or_buffer))
E   RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
===================================================================== warnings summary =====================================================================
../../Library/Caches/pypoetry/virtualenvs/openadapt-VBXg4jpm-py3.10/lib/python3.10/site-packages/fuzzywuzzy/fuzz.py:11
  /Users/abrichr/Library/Caches/pypoetry/virtualenvs/openadapt-VBXg4jpm-py3.10/lib/python3.10/site-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
    warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')

../../Library/Caches/pypoetry/virtualenvs/openadapt-VBXg4jpm-py3.10/lib/python3.10/site-packages/onnxruntime/capi/_pybind_state.py:28
  /Users/abrichr/Library/Caches/pypoetry/virtualenvs/openadapt-VBXg4jpm-py3.10/lib/python3.10/site-packages/onnxruntime/capi/_pybind_state.py:28: DeprecationWarning: invalid escape sequence '\S'
    "(other than %SystemRoot%\System32), "

../../Library/Caches/pypoetry/virtualenvs/openadapt-VBXg4jpm-py3.10/lib/python3.10/site-packages/pkg_resources/__init__.py:121
  /Users/abrichr/Library/Caches/pypoetry/virtualenvs/openadapt-VBXg4jpm-py3.10/lib/python3.10/site-packages/pkg_resources/__init__.py:121: DeprecationWarning: pkg_resources is deprecated as an API
    warnings.warn("pkg_resources is deprecated as an API", DeprecationWarning)

../../Library/Caches/pypoetry/virtualenvs/openadapt-VBXg4jpm-py3.10/lib/python3.10/site-packages/pkg_resources/__init__.py:2870
  /Users/abrichr/Library/Caches/pypoetry/virtualenvs/openadapt-VBXg4jpm-py3.10/lib/python3.10/site-packages/pkg_resources/__init__.py:2870: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('mpl_toolkits')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
    declare_namespace(pkg)

../../Library/Caches/pypoetry/virtualenvs/openadapt-VBXg4jpm-py3.10/lib/python3.10/site-packages/pkg_resources/__init__.py:2870
../../Library/Caches/pypoetry/virtualenvs/openadapt-VBXg4jpm-py3.10/lib/python3.10/site-packages/pkg_resources/__init__.py:2870
../../Library/Caches/pypoetry/virtualenvs/openadapt-VBXg4jpm-py3.10/lib/python3.10/site-packages/pkg_resources/__init__.py:2870
../../Library/Caches/pypoetry/virtualenvs/openadapt-VBXg4jpm-py3.10/lib/python3.10/site-packages/pkg_resources/__init__.py:2870
  /Users/abrichr/Library/Caches/pypoetry/virtualenvs/openadapt-VBXg4jpm-py3.10/lib/python3.10/site-packages/pkg_resources/__init__.py:2870: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
    declare_namespace(pkg)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================= short test summary info ==================================================================
ERROR tests/openadapt/test_summary_mixin.py - RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
========================================================= 8 warnings, 1 error in 64.82s (0:01:04) ==========================================================

To Reproduce

Follow recommended installation instructions in README:

git clone https://github.com/MLDSAI/OpenAdapt.git
cd OpenAdapt
pip install poetry
poetry install
poetry shell
alembic upgrade head
pytest
abrichr commented 1 year ago

Two issues:

  1. In https://github.com/MLDSAI/OpenAdapt/blob/main/tests/openadapt/test_summary_mixin.py#LL10C10-L10C28:
REPLAY = DemoReplayStrategy(RECORDING)

@dianzrong can you please modify this to test only the summary mixin and not other functionality?

  1. In https://github.com/MLDSAI/OpenAdapt/blob/main/openadapt/strategies/mixins/sam.py#LL60C16-L60C34

Not sure what's going on here, @jesicasusanto can you please take a look?

KrishPatel13 commented 1 year ago

@abrichr Even I got something similar on Windows: See below,

(openadapt-py3.10) PS P:\OpenAdapt AI - MLDS AI\cloned_repo\OpenAdapt> pytest
================================ test session starts =================================
platform win32 -- Python 3.10.11, pytest-7.1.3, pluggy-1.0.0
rootdir: P:\OpenAdapt AI - MLDS AI\cloned_repo\OpenAdapt
plugins: anyio-3.7.0
collected 23 items / 1 error

=========================================================== ERRORS ============================================================
___________________________________ ERROR collecting tests/openadapt/test_summary_mixin.py ____________________________________
tests\openadapt\test_summary_mixin.py:10: in <module>
    REPLAY = DemoReplayStrategy(RECORDING)
openadapt\strategies\demo.py:41: in __init__
    self.screenshots = get_screenshots(recording)
openadapt\crud.py:149: in get_screenshots
    screenshots[0].prev = screenshots[0]
E   IndexError: list index out of range
------------------------------------------------------- Captured stderr ------------------------------------------------------- 
2023-06-20 13:52:09.040 | INFO     | openadapt.strategies.mixins.sam:_initialize_model:58 - downloading checkpoint_url='https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth' to checkpoint_file_path=WindowsPath('checkpoints/sam_vit_h_4b8939.pth')
2023-06-20 13:57:01.191 | INFO     | openadapt.strategies.mixins.huggingface:__init__:32 - model_name='gpt2'
Downloading (…)lve/main/config.json: 100%|██████████| 665/665 [00:00<?, ?B/s]
Downloading (…)olve/main/vocab.json: 100%|██████████| 1.04M/1.04M [00:00<00:00, 5.46MB/s]
Downloading (…)olve/main/merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 5.08MB/s]
Downloading (…)/main/tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 7.68MB/s]
Downloading pytorch_model.bin: 100%|██████████| 548M/548M [00:47<00:00, 11.6MB/s]
Downloading (…)neration_config.json: 100%|██████████| 124/124 [00:00<?, ?B/s]
====================================================== warnings summary ======================================================= 
C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\fuzzywuzzy\fuzz.py:11 
  C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\fuzzywuzzy\fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
    warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')

C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\onnxruntime\capi\_pybind_state.py:28
  C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\onnxruntime\capi\_pybind_state.py:28: DeprecationWarning: invalid escape sequence '\S'
    "(other than %SystemRoot%\System32), "

C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\pkg_resources\__init__.py:121
  C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\pkg_resources\__init__.py:121: DeprecationWarning: pkg_resources is deprecated as an API
    warnings.warn("pkg_resources is deprecated as an API", DeprecationWarning)

C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\pkg_resources\__init__.py:2870
  C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\pkg_resources\__init__.py:2870: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('mpl_toolkits')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
    declare_namespace(pkg)

C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\pkg_resources\__init__.py:2870
C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\pkg_resources\__init__.py:2870
C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\pkg_resources\__init__.py:2870
C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\pkg_resources\__init__.py:2870
  C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\pkg_resources\__init__.py:2870: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
    declare_namespace(pkg)

C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\huggingface_hub\file_download.py:133
  C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\huggingface_hub\file_download.py:133: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\Krish Patel\.cache\huggingface\hub. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
  To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
    warnings.warn(message)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=================================================== short test summary info =================================================== 
ERROR tests/openadapt/test_summary_mixin.py - IndexError: list index out of range
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 
========================================== 9 warnings, 1 error in 388.13s (0:06:28) =========================================== 
(openadapt-py3.10) PS P:\OpenAdapt AI - MLDS AI\cloned_repo\OpenAdapt> 
KrishPatel13 commented 1 year ago

This is what I get on running pytest on my newly cloned repo:

Any thoughts why 2 tests are failing ? Note: I have not ran record yet.

(openadapt-py3.10) PS P:\OpenAdapt AI - MLDS AI\cloned_repo\OpenAdapt> pytest
============================================================================================== test session starts ==============================================================================================
platform win32 -- Python 3.10.11, pytest-7.1.3, pluggy-1.0.0
rootdir: P:\OpenAdapt AI - MLDS AI\cloned_repo\OpenAdapt
plugins: anyio-3.7.0
collected 25 items

tests\openadapt\test_crop.py .                                                                                                                                                                             [  4%]
tests\openadapt\test_events.py .......                                                                                                                                                                     [ 32%]
tests\openadapt\test_scrub.py ...............                                                                                                                                                              [ 92%]
tests\openadapt\test_summary.py FF                                                                                                                                                                         [100%]

=================================================================================================== FAILURES ==================================================================================================== 
______________________________________________________________________________________________ test_summary_empty _______________________________________________________________________________________________ 

self = <sumy.nlp.tokenizers.Tokenizer object at 0x0000019241BDC7F0>, language = 'english'

    def _get_sentence_tokenizer(self, language):
        if language in self.SPECIAL_SENTENCE_TOKENIZERS:
            return self.SPECIAL_SENTENCE_TOKENIZERS[language]
        try:
            path = to_string("tokenizers/punkt/%s.pickle") % to_string(language)
>           return nltk.data.load(path)

C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\sumy\nlp\tokenizers.py:172:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

resource_url = 'nltk:tokenizers/punkt/english.pickle', format = 'pickle', cache = True, verbose = False, logic_parser = None, fstruct_reader = None, encoding = None

    def load(
        resource_url,
        format="auto",
        cache=True,
        verbose=False,
        logic_parser=None,
        fstruct_reader=None,
        encoding=None,
    ):
        """
        Load a given resource from the NLTK data package.  The following
        resource formats are currently supported:

          - ``pickle``
          - ``json``
          - ``yaml``
          - ``cfg`` (context free grammars)
          - ``pcfg`` (probabilistic CFGs)
          - ``fcfg`` (feature-based CFGs)
          - ``fol`` (formulas of First Order Logic)
          - ``logic`` (Logical formulas to be parsed by the given logic_parser)
          - ``val`` (valuation of First Order Logic model)
          - ``text`` (the file contents as a unicode string)
          - ``raw`` (the raw file contents as a byte string)

        If no format is specified, ``load()`` will attempt to determine a
        format based on the resource name's file extension.  If that
        fails, ``load()`` will raise a ``ValueError`` exception.

        For all text formats (everything except ``pickle``, ``json``, ``yaml`` and ``raw``),
        it tries to decode the raw contents using UTF-8, and if that doesn't
        work, it tries with ISO-8859-1 (Latin-1), unless the ``encoding``
        is specified.

        :type resource_url: str
        :param resource_url: A URL specifying where the resource should be
            loaded from.  The default protocol is "nltk:", which searches
            for the file in the the NLTK data package.
        :type cache: bool
        :param cache: If true, add this resource to a cache.  If load()
            finds a resource in its cache, then it will return it from the
            cache rather than loading it.
        :type verbose: bool
        :param verbose: If true, print a message when loading a resource.
            Messages are not displayed when a resource is retrieved from
            the cache.
        :type logic_parser: LogicParser
        :param logic_parser: The parser that will be used to parse logical
            expressions.
        :type fstruct_reader: FeatStructReader
        :param fstruct_reader: The parser that will be used to parse the
            feature structure of an fcfg.
        :type encoding: str
        :param encoding: the encoding of the input; only used for text formats.
        """
        resource_url = normalize_resource_url(resource_url)
        resource_url = add_py3_data(resource_url)

        # Determine the format of the resource.
        if format == "auto":
            resource_url_parts = resource_url.split(".")
            ext = resource_url_parts[-1]
            if ext == "gz":
                ext = resource_url_parts[-2]
            format = AUTO_FORMATS.get(ext)
            if format is None:
                raise ValueError(
                    "Could not determine format for %s based "
                    'on its file\nextension; use the "format" '
                    "argument to specify the format explicitly." % resource_url
                )

        if format not in FORMATS:
            raise ValueError(f"Unknown format type: {format}!")

        # If we've cached the resource, then just return it.
        if cache:
            resource_val = _resource_cache.get((resource_url, format))
            if resource_val is not None:
                if verbose:
                    print(f"<<Using cached copy of {resource_url}>>")
                return resource_val

        # Let the user know what's going on.
        if verbose:
            print(f"<<Loading {resource_url}>>")

        # Load the resource.
>       opened_resource = _open(resource_url)

C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\nltk\data.py:750:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

resource_url = 'nltk:tokenizers/punkt/english.pickle'

    def _open(resource_url):
        """
        Helper function that returns an open file object for a resource,
        given its resource URL.  If the given resource URL uses the "nltk:"
        protocol, or uses no protocol, then use ``nltk.data.find`` to find
        its path, and open it with the given mode; if the resource URL
        uses the 'file' protocol, then open the file with the given mode;
        otherwise, delegate to ``urllib2.urlopen``.

        :type resource_url: str
        :param resource_url: A URL specifying where the resource should be
            loaded from.  The default protocol is "nltk:", which searches
            for the file in the the NLTK data package.
        """
        resource_url = normalize_resource_url(resource_url)
        protocol, path_ = split_resource_url(resource_url)

        if protocol is None or protocol.lower() == "nltk":
>           return find(path_, path + [""]).open()

C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\nltk\data.py:876:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

resource_name = 'tokenizers/punkt/english.pickle'
paths = ['C:\\Users\\Krish Patel/nltk_data', 'C:\\Users\\Krish Patel\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\openadapt-...penadapt-NIwuSzHt-py3.10\\lib\\nltk_data', 'C:\\Users\\Krish Patel\\AppData\\Roaming\\nltk_data', 'C:\\nltk_data', ...]

    def find(resource_name, paths=None):
        """
        Find the given resource by searching through the directories and
        zip files in paths, where a None or empty string specifies an absolute path.
        Returns a corresponding path name.  If the given resource is not
        found, raise a ``LookupError``, whose message gives a pointer to
        the installation instructions for the NLTK downloader.

        Zip File Handling:

          - If ``resource_name`` contains a component with a ``.zip``
            extension, then it is assumed to be a zipfile; and the
            remaining path components are used to look inside the zipfile.

          - If any element of ``nltk.data.path`` has a ``.zip`` extension,
            then it is assumed to be a zipfile.

          - If a given resource name that does not contain any zipfile
            component is not found initially, then ``find()`` will make a
            second attempt to find that resource, by replacing each
            component *p* in the path with *p.zip/p*.  For example, this
            allows ``find()`` to map the resource name
            ``corpora/chat80/cities.pl`` to a zip file path pointer to
            ``corpora/chat80.zip/chat80/cities.pl``.

          - When using ``find()`` to locate a directory contained in a
            zipfile, the resource name must end with the forward slash
            character.  Otherwise, ``find()`` will not locate the
            directory.

        :type resource_name: str or unicode
        :param resource_name: The name of the resource to search for.
            Resource names are posix-style relative path names, such as
            ``corpora/brown``.  Directory names will be
            automatically converted to a platform-appropriate path separator.
        :rtype: str
        """
        resource_name = normalize_resource_name(resource_name, True)

        # Resolve default paths at runtime in-case the user overrides
        # nltk.data.path
        if paths is None:
            paths = path

        # Check if the resource name includes a zipfile name
        m = re.match(r"(.*\.zip)/?(.*)$|", resource_name)
        zipfile, zipentry = m.groups()

        # Check each item in our path
        for path_ in paths:
            # Is the path item a zipfile?
            if path_ and (os.path.isfile(path_) and path_.endswith(".zip")):
                try:
                    return ZipFilePathPointer(path_, resource_name)
                except OSError:
                    # resource not in zipfile
                    continue

            # Is the path item a directory or is resource_name an absolute path?
            elif not path_ or os.path.isdir(path_):
                if zipfile is None:
                    p = os.path.join(path_, url2pathname(resource_name))
                    if os.path.exists(p):
                        if p.endswith(".gz"):
                            return GzipFileSystemPathPointer(p)
                        else:
                            return FileSystemPathPointer(p)
                else:
                    p = os.path.join(path_, url2pathname(zipfile))
                    if os.path.exists(p):
                        try:
                            return ZipFilePathPointer(p, zipentry)
                        except OSError:
                            # resource not in zipfile
                            continue

        # Fallback: if the path doesn't include a zip file, then try
        # again, assuming that one of the path components is inside a
        # zipfile of the same name.
        if zipfile is None:
            pieces = resource_name.split("/")
            for i in range(len(pieces)):
                modified_name = "/".join(pieces[:i] + [pieces[i] + ".zip"] + pieces[i:])
                try:
                    return find(modified_name, paths)
                except LookupError:
                    pass

        # Identify the package (i.e. the .zip file) to download.
        resource_zipname = resource_name.split("/")[1]
        if resource_zipname.endswith(".zip"):
            resource_zipname = resource_zipname.rpartition(".")[0]
        # Display a friendly error message if the resource wasn't found:
        msg = str(
            "Resource \33[93m{resource}\033[0m not found.\n"
            "Please use the NLTK Downloader to obtain the resource:\n\n"
            "\33[31m"  # To display red text in terminal.
            ">>> import nltk\n"
            ">>> nltk.download('{resource}')\n"
            "\033[0m"
        ).format(resource=resource_zipname)
        msg = textwrap_indent(msg)

        msg += "\n  For more information see: https://www.nltk.org/data.html\n"

        msg += "\n  Attempted to load \33[93m{resource_name}\033[0m\n".format(
            resource_name=resource_name
        )

        msg += "\n  Searched in:" + "".join("\n    - %r" % d for d in paths)
        sep = "*" * 70
        resource_not_found = f"\n{sep}\n{msg}\n{sep}\n"
>       raise LookupError(resource_not_found)
E       LookupError: 
E       **********************************************************************
E         Resource punkt not found.
E         Please use the NLTK Downloader to obtain the resource:
E       
E         >>> import nltk
E         >>> nltk.download('punkt')
E
E         For more information see: https://www.nltk.org/data.html
E       
E         Attempted to load tokenizers/punkt/english.pickle
E       
E         Searched in:
E           - 'C:\\Users\\Krish Patel/nltk_data'
E           - 'C:\\Users\\Krish Patel\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\openadapt-NIwuSzHt-py3.10\\nltk_data'
E           - 'C:\\Users\\Krish Patel\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\openadapt-NIwuSzHt-py3.10\\share\\nltk_data'
E           - 'C:\\Users\\Krish Patel\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\openadapt-NIwuSzHt-py3.10\\lib\\nltk_data'
E           - 'C:\\Users\\Krish Patel\\AppData\\Roaming\\nltk_data'
E           - 'C:\\nltk_data'
E           - 'D:\\nltk_data'
E           - 'E:\\nltk_data'
E           - ''
E       **********************************************************************

C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\nltk\data.py:583: LookupError

During handling of the above exception, another exception occurred:

    def test_summary_empty():
        empty_text = ""
>       actual = REPLAY.get_summary(empty_text, 1)

tests\openadapt\test_summary.py:28:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
openadapt\strategies\mixins\summary.py:48: in get_summary
    parser = PlaintextParser.from_string(text, Tokenizer("english"))
C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\sumy\nlp\tokenizers.py:160: in __init__
    self._sentence_tokenizer = self._get_sentence_tokenizer(tokenizer_language)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <sumy.nlp.tokenizers.Tokenizer object at 0x0000019241BDC7F0>, language = 'english'

    def _get_sentence_tokenizer(self, language):
        if language in self.SPECIAL_SENTENCE_TOKENIZERS:
            return self.SPECIAL_SENTENCE_TOKENIZERS[language]
        try:
            path = to_string("tokenizers/punkt/%s.pickle") % to_string(language)
            return nltk.data.load(path)
        except (LookupError, zipfile.BadZipfile) as e:
>           raise LookupError(
                "NLTK tokenizers are missing or the language is not supported.\n"
                """Download them by following command: python -c "import nltk; nltk.download('punkt')"\n"""
                "Original error was:\n" + str(e)
            )
E           LookupError: NLTK tokenizers are missing or the language is not supported.
E           Download them by following command: python -c "import nltk; nltk.download('punkt')"
E           Original error was:
E
E           **********************************************************************
E             Resource punkt not found.
E             Please use the NLTK Downloader to obtain the resource:
E
E             >>> import nltk
E             >>> nltk.download('punkt')
E
E             For more information see: https://www.nltk.org/data.html
E
E             Attempted to load tokenizers/punkt/english.pickle
E
E             Searched in:
E               - 'C:\\Users\\Krish Patel/nltk_data'
E               - 'C:\\Users\\Krish Patel\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\openadapt-NIwuSzHt-py3.10\\nltk_data'
E               - 'C:\\Users\\Krish Patel\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\openadapt-NIwuSzHt-py3.10\\share\\nltk_data'
E               - 'C:\\Users\\Krish Patel\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\openadapt-NIwuSzHt-py3.10\\lib\\nltk_data'
E               - 'C:\\Users\\Krish Patel\\AppData\\Roaming\\nltk_data'
E               - 'C:\\nltk_data'
E               - 'D:\\nltk_data'
E               - 'E:\\nltk_data'
E               - ''
E           **********************************************************************

C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\sumy\nlp\tokenizers.py:174: LookupError
_____________________________________________________________________________________________ test_summary_sentence _____________________________________________________________________________________________ 

self = <sumy.nlp.tokenizers.Tokenizer object at 0x0000019241BCBE80>, language = 'english'

    def _get_sentence_tokenizer(self, language):
        if language in self.SPECIAL_SENTENCE_TOKENIZERS:
            return self.SPECIAL_SENTENCE_TOKENIZERS[language]
        try:
            path = to_string("tokenizers/punkt/%s.pickle") % to_string(language)
>           return nltk.data.load(path)

C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\sumy\nlp\tokenizers.py:172:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

resource_url = 'nltk:tokenizers/punkt/english.pickle', format = 'pickle', cache = True, verbose = False, logic_parser = None, fstruct_reader = None, encoding = None

    def load(
        resource_url,
        format="auto",
        cache=True,
        verbose=False,
        logic_parser=None,
        fstruct_reader=None,
        encoding=None,
    ):
        """
        Load a given resource from the NLTK data package.  The following
        resource formats are currently supported:

          - ``pickle``
          - ``json``
          - ``yaml``
          - ``cfg`` (context free grammars)
          - ``pcfg`` (probabilistic CFGs)
          - ``fcfg`` (feature-based CFGs)
          - ``fol`` (formulas of First Order Logic)
          - ``logic`` (Logical formulas to be parsed by the given logic_parser)
          - ``val`` (valuation of First Order Logic model)
          - ``text`` (the file contents as a unicode string)
          - ``raw`` (the raw file contents as a byte string)

        If no format is specified, ``load()`` will attempt to determine a
        format based on the resource name's file extension.  If that
        fails, ``load()`` will raise a ``ValueError`` exception.

        For all text formats (everything except ``pickle``, ``json``, ``yaml`` and ``raw``),
        it tries to decode the raw contents using UTF-8, and if that doesn't
        work, it tries with ISO-8859-1 (Latin-1), unless the ``encoding``
        is specified.

        :type resource_url: str
        :param resource_url: A URL specifying where the resource should be
            loaded from.  The default protocol is "nltk:", which searches
            for the file in the the NLTK data package.
        :type cache: bool
        :param cache: If true, add this resource to a cache.  If load()
            finds a resource in its cache, then it will return it from the
            cache rather than loading it.
        :type verbose: bool
        :param verbose: If true, print a message when loading a resource.
            Messages are not displayed when a resource is retrieved from
            the cache.
        :type logic_parser: LogicParser
        :param logic_parser: The parser that will be used to parse logical
            expressions.
        :type fstruct_reader: FeatStructReader
        :param fstruct_reader: The parser that will be used to parse the
            feature structure of an fcfg.
        :type encoding: str
        :param encoding: the encoding of the input; only used for text formats.
        """
        resource_url = normalize_resource_url(resource_url)
        resource_url = add_py3_data(resource_url)

        # Determine the format of the resource.
        if format == "auto":
            resource_url_parts = resource_url.split(".")
            ext = resource_url_parts[-1]
            if ext == "gz":
                ext = resource_url_parts[-2]
            format = AUTO_FORMATS.get(ext)
            if format is None:
                raise ValueError(
                    "Could not determine format for %s based "
                    'on its file\nextension; use the "format" '
                    "argument to specify the format explicitly." % resource_url
                )

        if format not in FORMATS:
            raise ValueError(f"Unknown format type: {format}!")

        # If we've cached the resource, then just return it.
        if cache:
            resource_val = _resource_cache.get((resource_url, format))
            if resource_val is not None:
                if verbose:
                    print(f"<<Using cached copy of {resource_url}>>")
                return resource_val

        # Let the user know what's going on.
        if verbose:
            print(f"<<Loading {resource_url}>>")

        # Load the resource.
>       opened_resource = _open(resource_url)

C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\nltk\data.py:750:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

resource_url = 'nltk:tokenizers/punkt/english.pickle'

    def _open(resource_url):
        """
        Helper function that returns an open file object for a resource,
        given its resource URL.  If the given resource URL uses the "nltk:"
        protocol, or uses no protocol, then use ``nltk.data.find`` to find
        its path, and open it with the given mode; if the resource URL
        uses the 'file' protocol, then open the file with the given mode;
        otherwise, delegate to ``urllib2.urlopen``.

        :type resource_url: str
        :param resource_url: A URL specifying where the resource should be
            loaded from.  The default protocol is "nltk:", which searches
            for the file in the the NLTK data package.
        """
        resource_url = normalize_resource_url(resource_url)
        protocol, path_ = split_resource_url(resource_url)

        if protocol is None or protocol.lower() == "nltk":
>           return find(path_, path + [""]).open()

C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\nltk\data.py:876:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

resource_name = 'tokenizers/punkt/english.pickle'
paths = ['C:\\Users\\Krish Patel/nltk_data', 'C:\\Users\\Krish Patel\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\openadapt-...penadapt-NIwuSzHt-py3.10\\lib\\nltk_data', 'C:\\Users\\Krish Patel\\AppData\\Roaming\\nltk_data', 'C:\\nltk_data', ...]

    def find(resource_name, paths=None):
        """
        Find the given resource by searching through the directories and
        zip files in paths, where a None or empty string specifies an absolute path.
        Returns a corresponding path name.  If the given resource is not
        found, raise a ``LookupError``, whose message gives a pointer to
        the installation instructions for the NLTK downloader.

        Zip File Handling:

          - If ``resource_name`` contains a component with a ``.zip``
            extension, then it is assumed to be a zipfile; and the
            remaining path components are used to look inside the zipfile.

          - If any element of ``nltk.data.path`` has a ``.zip`` extension,
            then it is assumed to be a zipfile.

          - If a given resource name that does not contain any zipfile
            component is not found initially, then ``find()`` will make a
            second attempt to find that resource, by replacing each
            component *p* in the path with *p.zip/p*.  For example, this
            allows ``find()`` to map the resource name
            ``corpora/chat80/cities.pl`` to a zip file path pointer to
            ``corpora/chat80.zip/chat80/cities.pl``.

          - When using ``find()`` to locate a directory contained in a
            zipfile, the resource name must end with the forward slash
            character.  Otherwise, ``find()`` will not locate the
            directory.

        :type resource_name: str or unicode
        :param resource_name: The name of the resource to search for.
            Resource names are posix-style relative path names, such as
            ``corpora/brown``.  Directory names will be
            automatically converted to a platform-appropriate path separator.
        :rtype: str
        """
        resource_name = normalize_resource_name(resource_name, True)

        # Resolve default paths at runtime in-case the user overrides
        # nltk.data.path
        if paths is None:
            paths = path

        # Check if the resource name includes a zipfile name
        m = re.match(r"(.*\.zip)/?(.*)$|", resource_name)
        zipfile, zipentry = m.groups()

        # Check each item in our path
        for path_ in paths:
            # Is the path item a zipfile?
            if path_ and (os.path.isfile(path_) and path_.endswith(".zip")):
                try:
                    return ZipFilePathPointer(path_, resource_name)
                except OSError:
                    # resource not in zipfile
                    continue

            # Is the path item a directory or is resource_name an absolute path?
            elif not path_ or os.path.isdir(path_):
                if zipfile is None:
                    p = os.path.join(path_, url2pathname(resource_name))
                    if os.path.exists(p):
                        if p.endswith(".gz"):
                            return GzipFileSystemPathPointer(p)
                        else:
                            return FileSystemPathPointer(p)
                else:
                    p = os.path.join(path_, url2pathname(zipfile))
                    if os.path.exists(p):
                        try:
                            return ZipFilePathPointer(p, zipentry)
                        except OSError:
                            # resource not in zipfile
                            continue

        # Fallback: if the path doesn't include a zip file, then try
        # again, assuming that one of the path components is inside a
        # zipfile of the same name.
        if zipfile is None:
            pieces = resource_name.split("/")
            for i in range(len(pieces)):
                modified_name = "/".join(pieces[:i] + [pieces[i] + ".zip"] + pieces[i:])
                try:
                    return find(modified_name, paths)
                except LookupError:
                    pass

        # Identify the package (i.e. the .zip file) to download.
        resource_zipname = resource_name.split("/")[1]
        if resource_zipname.endswith(".zip"):
            resource_zipname = resource_zipname.rpartition(".")[0]
        # Display a friendly error message if the resource wasn't found:
        msg = str(
            "Resource \33[93m{resource}\033[0m not found.\n"
            "Please use the NLTK Downloader to obtain the resource:\n\n"
            "\33[31m"  # To display red text in terminal.
            ">>> import nltk\n"
            ">>> nltk.download('{resource}')\n"
            "\033[0m"
        ).format(resource=resource_zipname)
        msg = textwrap_indent(msg)

        msg += "\n  For more information see: https://www.nltk.org/data.html\n"

        msg += "\n  Attempted to load \33[93m{resource_name}\033[0m\n".format(
            resource_name=resource_name
        )

        msg += "\n  Searched in:" + "".join("\n    - %r" % d for d in paths)
        sep = "*" * 70
        resource_not_found = f"\n{sep}\n{msg}\n{sep}\n"
>       raise LookupError(resource_not_found)
E       LookupError: 
E       **********************************************************************
E         Resource punkt not found.
E         Please use the NLTK Downloader to obtain the resource:
E       
E         >>> import nltk
E         >>> nltk.download('punkt')
E
E         For more information see: https://www.nltk.org/data.html
E       
E         Attempted to load tokenizers/punkt/english.pickle
E       
E         Searched in:
E           - 'C:\\Users\\Krish Patel/nltk_data'
E           - 'C:\\Users\\Krish Patel\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\openadapt-NIwuSzHt-py3.10\\nltk_data'
E           - 'C:\\Users\\Krish Patel\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\openadapt-NIwuSzHt-py3.10\\share\\nltk_data'
E           - 'C:\\Users\\Krish Patel\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\openadapt-NIwuSzHt-py3.10\\lib\\nltk_data'
E           - 'C:\\Users\\Krish Patel\\AppData\\Roaming\\nltk_data'
E           - 'C:\\nltk_data'
E           - 'D:\\nltk_data'
E           - 'E:\\nltk_data'
E           - ''
E       **********************************************************************

C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\nltk\data.py:583: LookupError

During handling of the above exception, another exception occurred:

    def test_summary_sentence():
        story = "However, this bottle was not marked “poison,” so Alice ventured to taste it, \
            and finding it very nice, (it had, in fact, a sort of mixed flavour of cherry-tart, \
            custard, pine-apple, roast turkey, toffee, and hot buttered toast,) \
            she very soon finished it off."
>       actual = REPLAY.get_summary(story, 1)

tests\openadapt\test_summary.py:37:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
openadapt\strategies\mixins\summary.py:48: in get_summary
    parser = PlaintextParser.from_string(text, Tokenizer("english"))
C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\sumy\nlp\tokenizers.py:160: in __init__
    self._sentence_tokenizer = self._get_sentence_tokenizer(tokenizer_language)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <sumy.nlp.tokenizers.Tokenizer object at 0x0000019241BCBE80>, language = 'english'

    def _get_sentence_tokenizer(self, language):
        if language in self.SPECIAL_SENTENCE_TOKENIZERS:
            return self.SPECIAL_SENTENCE_TOKENIZERS[language]
        try:
            path = to_string("tokenizers/punkt/%s.pickle") % to_string(language)
            return nltk.data.load(path)
        except (LookupError, zipfile.BadZipfile) as e:
>           raise LookupError(
                "NLTK tokenizers are missing or the language is not supported.\n"
                """Download them by following command: python -c "import nltk; nltk.download('punkt')"\n"""
                "Original error was:\n" + str(e)
            )
E           LookupError: NLTK tokenizers are missing or the language is not supported.
E           Download them by following command: python -c "import nltk; nltk.download('punkt')"
E           Original error was:
E
E           **********************************************************************
E             Resource punkt not found.
E             Please use the NLTK Downloader to obtain the resource:
E
E             >>> import nltk
E             >>> nltk.download('punkt')
E
E             For more information see: https://www.nltk.org/data.html
E
E             Attempted to load tokenizers/punkt/english.pickle
E
E             Searched in:
E               - 'C:\\Users\\Krish Patel/nltk_data'
E               - 'C:\\Users\\Krish Patel\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\openadapt-NIwuSzHt-py3.10\\nltk_data'
E               - 'C:\\Users\\Krish Patel\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\openadapt-NIwuSzHt-py3.10\\share\\nltk_data'
E               - 'C:\\Users\\Krish Patel\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\openadapt-NIwuSzHt-py3.10\\lib\\nltk_data'
E               - 'C:\\Users\\Krish Patel\\AppData\\Roaming\\nltk_data'
E               - 'C:\\nltk_data'
E               - 'D:\\nltk_data'
E               - 'E:\\nltk_data'
E               - ''
E           **********************************************************************

C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\sumy\nlp\tokenizers.py:174: LookupError
=============================================================================================== warnings summary ================================================================================================ 
C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\fuzzywuzzy\fuzz.py:11
  C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\fuzzywuzzy\fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
    warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')

C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\pkg_resources\__init__.py:121
  C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\pkg_resources\__init__.py:121: DeprecationWarning: pkg_resources is deprecated as an API
    warnings.warn("pkg_resources is deprecated as an API", DeprecationWarning)

C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\pkg_resources\__init__.py:2870
  C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\pkg_resources\__init__.py:2870: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('mpl_toolkits')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
    declare_namespace(pkg)

C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\pkg_resources\__init__.py:2870
C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\pkg_resources\__init__.py:2870
C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\pkg_resources\__init__.py:2870
C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\pkg_resources\__init__.py:2870
  C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-NIwuSzHt-py3.10\lib\site-packages\pkg_resources\__init__.py:2870: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
    declare_namespace(pkg)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================================ short test summary info ============================================================================================ 
FAILED tests/openadapt/test_summary.py::test_summary_empty - LookupError: NLTK tokenizers are missing or the language is not supported.
FAILED tests/openadapt/test_summary.py::test_summary_sentence - LookupError: NLTK tokenizers are missing or the language is not supported.
=================================================================================== 2 failed, 23 passed, 7 warnings in 12.64s =================================================================================== 
(openadapt-py3.10) PS P:\OpenAdapt AI - MLDS AI\cloned_repo\OpenAdapt> 
jesicasusanto commented 1 year ago

When running pytest, I got this error :

`(.venv) C:\Users\jesic\PycharmProjects\PAT>pytest
=========================================== test session starts ===========================================
platform win32 -- Python 3.10.10, pytest-7.1.3, pluggy-1.0.0
rootdir: C:\Users\jesic\PycharmProjects\PAT
plugins: anyio-3.7.0
collected 25 items

tests\openadapt\test_crop.py .                                                                       [  4%]
tests\openadapt\test_events.py .......                                                               [ 32%]
tests\openadapt\test_scrub.py F..............                                                        [ 92%]
tests\openadapt\test_summary.py FF                                                                   [100%]

================================================ FAILURES ================================================= 
____________________________________________ test_scrub_image _____________________________________________ 

    @run_once
    def get_tesseract_version():
        """
        Returns LooseVersion object of the Tesseract version
        """
        try:
            return LooseVersion(
>               subprocess.check_output(
                    [tesseract_cmd, '--version'],
                    stderr=subprocess.STDOUT,
                    env=environ,
                )
                .decode(DEFAULT_ENCODING)
                .split()[1]
                .lstrip(string.printable[10:]),
            )

openadapt\.venv\lib\site-packages\pytesseract\pytesseract.py:383:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

timeout = None, popenargs = (['tesseract', '--version'],)
kwargs = {'env': environ({'ALLUSERSPROFILE': 'C:\\ProgramData', 'APPDATA': 'C:\\Users\\jesic\\AppData\\Roaming', 'CHOCOLATEYINS...NIT_AT_FORK': 'FALSE', 'PYTEST_CURRENT_TEST': 'tests/openadapt/test_scrub.py::test_scrub_image (call)'}), 'stderr': -2}

    def check_output(*popenargs, timeout=None, **kwargs):
        r"""Run command with arguments and return its output.

        If the exit code was non-zero it raises a CalledProcessError.  The
        CalledProcessError object will have the return code in the returncode
        attribute and output in the output attribute.

        The arguments are the same as for the Popen constructor.  Example:

        >>> check_output(["ls", "-l", "/dev/null"])
        b'crw-rw-rw- 1 root root 1, 3 Oct 18  2007 /dev/null\n'

        The stdout argument is not allowed as it is used internally.
        To capture standard error in the result, use stderr=STDOUT.

        >>> check_output(["/bin/sh", "-c",
        ...               "ls -l non_existent_file ; exit 0"],
        ...              stderr=STDOUT)
        b'ls: non_existent_file: No such file or directory\n'

        There is an additional optional argument, "input", allowing you to
        pass a string to the subprocess's stdin.  If you use this argument
        you may not also use the Popen constructor's "stdin" argument, as
        it too will be used internally.  Example:

        >>> check_output(["sed", "-e", "s/foo/bar/"],
        ...              input=b"when in the course of fooman events\n")
        b'when in the course of barman events\n'

        By default, all communication is in bytes, and therefore any "input"
        should be bytes, and the return value will be bytes.  If in text mode,
        any "input" should be a string, and the return value will be a string
        decoded according to locale encoding, or by "encoding" if set. Text mode
        is triggered by setting any of text, encoding, errors or universal_newlines.
        """
        if 'stdout' in kwargs:
            raise ValueError('stdout argument not allowed, it will be overridden.')

        if 'input' in kwargs and kwargs['input'] is None:
            # Explicitly passing input=None was previously equivalent to passing an
            # empty string. That is maintained here for backwards compatibility.
            if kwargs.get('universal_newlines') or kwargs.get('text') or kwargs.get('encoding') \
                    or kwargs.get('errors'):
                empty = ''
            else:
                empty = b''
            kwargs['input'] = empty

>       return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
                   **kwargs).stdout

C:\Python310\lib\subprocess.py:421:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

input = None, capture_output = False, timeout = None, check = True
popenargs = (['tesseract', '--version'],)
kwargs = {'env': environ({'ALLUSERSPROFILE': 'C:\\ProgramData', 'APPDATA': 'C:\\Users\\jesic\\AppData\\Roaming', 'CHOCOLATEYINS...'FALSE', 'PYTEST_CURRENT_TEST': 'tests/openadapt/test_scrub.py::test_scrub_image (call)'}), 'stderr': -2, 'stdout': -1}

    def run(*popenargs,
            input=None, capture_output=False, timeout=None, check=False, **kwargs):
        """Run command with arguments and return a CompletedProcess instance.

        The returned instance will have attributes args, returncode, stdout and
        stderr. By default, stdout and stderr are not captured, and those attributes
        will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them,
        or pass capture_output=True to capture both.

        If check is True and the exit code was non-zero, it raises a
        CalledProcessError. The CalledProcessError object will have the return code
        in the returncode attribute, and output & stderr attributes if those streams
        were captured.

        If timeout is given, and the process takes too long, a TimeoutExpired
        exception will be raised.

        There is an optional argument "input", allowing you to
        pass bytes or a string to the subprocess's stdin.  If you use this argument
        you may not also use the Popen constructor's "stdin" argument, as
        it will be used internally.

        By default, all communication is in bytes, and therefore any "input" should
        be bytes, and the stdout and stderr will be bytes. If in text mode, any
        "input" should be a string, and stdout and stderr will be strings decoded
        according to locale encoding, or by "encoding" if set. Text mode is
        triggered by setting any of text, encoding, errors or universal_newlines.

        The other arguments are the same as for the Popen constructor.
        """
        if input is not None:
            if kwargs.get('stdin') is not None:
                raise ValueError('stdin and input arguments may not both be used.')
            kwargs['stdin'] = PIPE

        if capture_output:
            if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
                raise ValueError('stdout and stderr arguments may not be used '
                                 'with capture_output.')
            kwargs['stdout'] = PIPE
            kwargs['stderr'] = PIPE

>       with Popen(*popenargs, **kwargs) as process:

C:\Python310\lib\subprocess.py:503:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Popen: returncode: None args: ['tesseract', '--version']>, args = ['tesseract', '--version']        
bufsize = -1, executable = None, stdin = None, stdout = -1, stderr = -2, preexec_fn = None, close_fds = True
shell = False, cwd = None
env = environ({'ALLUSERSPROFILE': 'C:\\ProgramData', 'APPDATA': 'C:\\Users\\jesic\\AppData\\Roaming', 'CHOCOLATEYINSTALL': '... 'True', 'KMP_INIT_AT_FORK': 'FALSE', 'PYTEST_CURRENT_TEST': 'tests/openadapt/test_scrub.py::test_scrub_image (call)'})
universal_newlines = None, startupinfo = None, creationflags = 0, restore_signals = True
start_new_session = False, pass_fds = ()

    def __init__(self, args, bufsize=-1, executable=None,
                 stdin=None, stdout=None, stderr=None,
                 preexec_fn=None, close_fds=True,
                 shell=False, cwd=None, env=None, universal_newlines=None,
                 startupinfo=None, creationflags=0,
                 restore_signals=True, start_new_session=False,
                 pass_fds=(), *, user=None, group=None, extra_groups=None,
                 encoding=None, errors=None, text=None, umask=-1, pipesize=-1):
        """Create new Popen instance."""
        _cleanup()
        # Held while anything is calling waitpid before returncode has been
        # updated to prevent clobbering returncode if wait() or poll() are
        # called from multiple threads at once.  After acquiring the lock,
        # code must re-check self.returncode to see if another thread just
        # finished a waitpid() call.
        self._waitpid_lock = threading.Lock()

        self._input = None
        self._communication_started = False
        if bufsize is None:
            bufsize = -1  # Restore default
        if not isinstance(bufsize, int):
            raise TypeError("bufsize must be an integer")

        if pipesize is None:
            pipesize = -1  # Restore default
        if not isinstance(pipesize, int):
            raise TypeError("pipesize must be an integer")

        if _mswindows:
            if preexec_fn is not None:
                raise ValueError("preexec_fn is not supported on Windows "
                                 "platforms")
        else:
            # POSIX
            if pass_fds and not close_fds:
                warnings.warn("pass_fds overriding close_fds.", RuntimeWarning)
                close_fds = True
            if startupinfo is not None:
                raise ValueError("startupinfo is only supported on Windows "
                                 "platforms")
            if creationflags != 0:
                raise ValueError("creationflags is only supported on Windows "
                                 "platforms")

        self.args = args
        self.stdin = None
        self.stdout = None
        self.stderr = None
        self.pid = None
        self.returncode = None
        self.encoding = encoding
        self.errors = errors
        self.pipesize = pipesize

        # Validate the combinations of text and universal_newlines
        if (text is not None and universal_newlines is not None
            and bool(universal_newlines) != bool(text)):
            raise SubprocessError('Cannot disambiguate when both text '
                                  'and universal_newlines are supplied but '
                                  'different. Pass one or the other.')

        # Input and output objects. The general principle is like
        # this:
        #
        # Parent                   Child
        # ------                   -----
        # p2cwrite   ---stdin--->  p2cread
        # c2pread    <--stdout---  c2pwrite
        # errread    <--stderr---  errwrite
        #
        # On POSIX, the child objects are file descriptors.  On
        # Windows, these are Windows file handles.  The parent objects
        # are file descriptors on both platforms.  The parent objects
        # are -1 when not using PIPEs. The child objects are -1
        # when not redirecting.

        (p2cread, p2cwrite,
         c2pread, c2pwrite,
         errread, errwrite) = self._get_handles(stdin, stdout, stderr)

        # We wrap OS handles *before* launching the child, otherwise a
        # quickly terminating child could make our fds unwrappable
        # (see #8458).

        if _mswindows:
            if p2cwrite != -1:
                p2cwrite = msvcrt.open_osfhandle(p2cwrite.Detach(), 0)
            if c2pread != -1:
                c2pread = msvcrt.open_osfhandle(c2pread.Detach(), 0)
            if errread != -1:
                errread = msvcrt.open_osfhandle(errread.Detach(), 0)

        self.text_mode = encoding or errors or text or universal_newlines

        # PEP 597: We suppress the EncodingWarning in subprocess module
        # for now (at Python 3.10), because we focus on files for now.
        # This will be changed to encoding = io.text_encoding(encoding)
        # in the future.
        if self.text_mode and encoding is None:
            self.encoding = encoding = "locale"

        # How long to resume waiting on a child after the first ^C.
        # There is no right value for this.  The purpose is to be polite
        # yet remain good for interactive users trying to exit a tool.
        self._sigint_wait_secs = 0.25  # 1/xkcd221.getRandomNumber()

        self._closed_child_pipe_fds = False

        if self.text_mode:
            if bufsize == 1:
                line_buffering = True
                # Use the default buffer size for the underlying binary streams
                # since they don't support line buffering.
                bufsize = -1
            else:
                line_buffering = False

        gid = None
        if group is not None:
            if not hasattr(os, 'setregid'):
                raise ValueError("The 'group' parameter is not supported on the "
                                 "current platform")

            elif isinstance(group, str):
                try:
                    import grp
                except ImportError:
                    raise ValueError("The group parameter cannot be a string "
                                     "on systems without the grp module")

                gid = grp.getgrnam(group).gr_gid
            elif isinstance(group, int):
                gid = group
            else:
                raise TypeError("Group must be a string or an integer, not {}"
                                .format(type(group)))

            if gid < 0:
                raise ValueError(f"Group ID cannot be negative, got {gid}")

        gids = None
        if extra_groups is not None:
            if not hasattr(os, 'setgroups'):
                raise ValueError("The 'extra_groups' parameter is not "
                                 "supported on the current platform")

            elif isinstance(extra_groups, str):
                raise ValueError("Groups must be a list, not a string")

            gids = []
            for extra_group in extra_groups:
                if isinstance(extra_group, str):
                    try:
                        import grp
                    except ImportError:
                        raise ValueError("Items in extra_groups cannot be "
                                         "strings on systems without the "
                                         "grp module")

                    gids.append(grp.getgrnam(extra_group).gr_gid)
                elif isinstance(extra_group, int):
                    gids.append(extra_group)
                else:
                    raise TypeError("Items in extra_groups must be a string "
                                    "or integer, not {}"
                                    .format(type(extra_group)))

            # make sure that the gids are all positive here so we can do less
            # checking in the C code
            for gid_check in gids:
                if gid_check < 0:
                    raise ValueError(f"Group ID cannot be negative, got {gid_check}")

        uid = None
        if user is not None:
            if not hasattr(os, 'setreuid'):
                raise ValueError("The 'user' parameter is not supported on "
                                 "the current platform")

            elif isinstance(user, str):
                try:
                    import pwd
                except ImportError:
                    raise ValueError("The user parameter cannot be a string "
                                     "on systems without the pwd module")
                uid = pwd.getpwnam(user).pw_uid
            elif isinstance(user, int):
                uid = user
            else:
                raise TypeError("User must be a string or an integer")

            if uid < 0:
                raise ValueError(f"User ID cannot be negative, got {uid}")

        try:
            if p2cwrite != -1:
                self.stdin = io.open(p2cwrite, 'wb', bufsize)
                if self.text_mode:
                    self.stdin = io.TextIOWrapper(self.stdin, write_through=True,
                            line_buffering=line_buffering,
                            encoding=encoding, errors=errors)
            if c2pread != -1:
                self.stdout = io.open(c2pread, 'rb', bufsize)
                if self.text_mode:
                    self.stdout = io.TextIOWrapper(self.stdout,
                            encoding=encoding, errors=errors)
            if errread != -1:
                self.stderr = io.open(errread, 'rb', bufsize)
                if self.text_mode:
                    self.stderr = io.TextIOWrapper(self.stderr,
                            encoding=encoding, errors=errors)

>           self._execute_child(args, executable, preexec_fn, close_fds,
                                pass_fds, cwd, env,
                                startupinfo, creationflags, shell,
                                p2cread, p2cwrite,
                                c2pread, c2pwrite,
                                errread, errwrite,
                                restore_signals,
                                gid, gids, uid, umask,
                                start_new_session)

C:\Python310\lib\subprocess.py:971:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Popen: returncode: None args: ['tesseract', '--version']>, args = 'tesseract --version'
executable = None, preexec_fn = None, close_fds = False, pass_fds = (), cwd = None
env = environ({'ALLUSERSPROFILE': 'C:\\ProgramData', 'APPDATA': 'C:\\Users\\jesic\\AppData\\Roaming', 'CHOCOLATEYINSTALL': '... 'True', 'KMP_INIT_AT_FORK': 'FALSE', 'PYTEST_CURRENT_TEST': 'tests/openadapt/test_scrub.py::test_scrub_image (call)'})
startupinfo = <subprocess.STARTUPINFO object at 0x00000172C2B17E50>, creationflags = 0, shell = False       
p2cread = Handle(7016), p2cwrite = -1, c2pread = 15, c2pwrite = Handle(6992), errread = -1
errwrite = Handle(2336), unused_restore_signals = True, unused_gid = None, unused_gids = None
unused_uid = None, unused_umask = -1, unused_start_new_session = False

    def _execute_child(self, args, executable, preexec_fn, close_fds,
                       pass_fds, cwd, env,
                       startupinfo, creationflags, shell,
                       p2cread, p2cwrite,
                       c2pread, c2pwrite,
                       errread, errwrite,
                       unused_restore_signals,
                       unused_gid, unused_gids, unused_uid,
                       unused_umask,
                       unused_start_new_session):
        """Execute program (MS Windows version)"""

        assert not pass_fds, "pass_fds not supported on Windows."

        if isinstance(args, str):
            pass
        elif isinstance(args, bytes):
            if shell:
                raise TypeError('bytes args is not allowed on Windows')
            args = list2cmdline([args])
        elif isinstance(args, os.PathLike):
            if shell:
                raise TypeError('path-like args is not allowed when '
                                'shell is true')
            args = list2cmdline([args])
        else:
            args = list2cmdline(args)

        if executable is not None:
            executable = os.fsdecode(executable)

        # Process startup details
        if startupinfo is None:
            startupinfo = STARTUPINFO()
        else:
            # bpo-34044: Copy STARTUPINFO since it is modified above,
            # so the caller can reuse it multiple times.
            startupinfo = startupinfo.copy()

        use_std_handles = -1 not in (p2cread, c2pwrite, errwrite)
        if use_std_handles:
            startupinfo.dwFlags |= _winapi.STARTF_USESTDHANDLES
            startupinfo.hStdInput = p2cread
            startupinfo.hStdOutput = c2pwrite
            startupinfo.hStdError = errwrite

        attribute_list = startupinfo.lpAttributeList
        have_handle_list = bool(attribute_list and
                                "handle_list" in attribute_list and
                                attribute_list["handle_list"])

        # If we were given an handle_list or need to create one
        if have_handle_list or (use_std_handles and close_fds):
            if attribute_list is None:
                attribute_list = startupinfo.lpAttributeList = {}
            handle_list = attribute_list["handle_list"] = \
                list(attribute_list.get("handle_list", []))

            if use_std_handles:
                handle_list += [int(p2cread), int(c2pwrite), int(errwrite)]

            handle_list[:] = self._filter_handle_list(handle_list)

            if handle_list:
                if not close_fds:
                    warnings.warn("startupinfo.lpAttributeList['handle_list'] "
                                  "overriding close_fds", RuntimeWarning)

                # When using the handle_list we always request to inherit
                # handles but the only handles that will be inherited are
                # the ones in the handle_list
                close_fds = False

        if shell:
            startupinfo.dwFlags |= _winapi.STARTF_USESHOWWINDOW
            startupinfo.wShowWindow = _winapi.SW_HIDE
            comspec = os.environ.get("COMSPEC", "cmd.exe")
            args = '{} /c "{}"'.format (comspec, args)

        if cwd is not None:
            cwd = os.fsdecode(cwd)

        sys.audit("subprocess.Popen", executable, args, cwd, env)

        # Start the process
        try:
>           hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
                                     # no special security
                                     None, None,
                                     int(not close_fds),
                                     creationflags,
                                     env,
                                     cwd,
                                     startupinfo)
E                                    FileNotFoundError: [WinError 2] The system cannot find the file specified

C:\Python310\lib\subprocess.py:1440: FileNotFoundError

During handling of the above exception, another exception occurred:

    def test_scrub_image() -> None:
        """
        Test that the scrubbed image data is different
        """

        warnings.filterwarnings("ignore", category=DeprecationWarning)

        # Read test image data from file
        test_image_path = "assets/test_scrub_image.png"
        with open(test_image_path, "rb") as file:
            test_image_data = file.read()

        # Convert image data to PIL Image object
        test_image = Image.open(BytesIO(test_image_data))

        # Scrub the image
>       scrubbed_image = scrub.scrub_image(test_image)

tests\openadapt\test_scrub.py:40:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
openadapt\scrub.py:103: in scrub_image
    redacted_image = IMAGE_REDACTOR.redact(
openadapt\.venv\lib\site-packages\presidio_image_redactor\image_redactor_engine.py:45: in redact
    bboxes = self.image_analyzer_engine.analyze(
openadapt\.venv\lib\site-packages\presidio_image_redactor\image_analyzer_engine.py:44: in analyze
    ocr_result = self.ocr.perform_ocr(image, **perform_ocr_kwargs)
openadapt\.venv\lib\site-packages\presidio_image_redactor\tesseract_ocr.py:18: in perform_ocr
    return pytesseract.image_to_data(image, output_type=output_type, **kwargs)
openadapt\.venv\lib\site-packages\pytesseract\pytesseract.py:507: in image_to_data
    if get_tesseract_version() < '3.05':
openadapt\.venv\lib\site-packages\pytesseract\pytesseract.py:148: in wrapper
    wrapper._result = func(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

    @run_once
    def get_tesseract_version():
        """
        Returns LooseVersion object of the Tesseract version
        """
        try:
            return LooseVersion(
                subprocess.check_output(
                    [tesseract_cmd, '--version'],
                    stderr=subprocess.STDOUT,
                    env=environ,
                )
                .decode(DEFAULT_ENCODING)
                .split()[1]
                .lstrip(string.printable[10:]),
            )
        except OSError:
>           raise TesseractNotFoundError()
E           pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information.

openadapt\.venv\lib\site-packages\pytesseract\pytesseract.py:393: TesseractNotFoundError
___________________________________________ test_summary_empty ____________________________________________ 

self = <sumy.nlp.tokenizers.Tokenizer object at 0x00000172C4C39420>, language = 'english'

    def _get_sentence_tokenizer(self, language):
        if language in self.SPECIAL_SENTENCE_TOKENIZERS:
            return self.SPECIAL_SENTENCE_TOKENIZERS[language]
        try:
            path = to_string("tokenizers/punkt/%s.pickle") % to_string(language)
>           return nltk.data.load(path)

openadapt\.venv\lib\site-packages\sumy\nlp\tokenizers.py:172:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

resource_url = 'nltk:tokenizers/punkt/english.pickle', format = 'pickle', cache = True, verbose = False     
logic_parser = None, fstruct_reader = None, encoding = None

    def load(
        resource_url,
        format="auto",
        cache=True,
        verbose=False,
        logic_parser=None,
        fstruct_reader=None,
        encoding=None,
    ):
        """
        Load a given resource from the NLTK data package.  The following
        resource formats are currently supported:

          - ``pickle``
          - ``json``
          - ``yaml``
          - ``cfg`` (context free grammars)
          - ``pcfg`` (probabilistic CFGs)
          - ``fcfg`` (feature-based CFGs)
          - ``fol`` (formulas of First Order Logic)
          - ``logic`` (Logical formulas to be parsed by the given logic_parser)
          - ``val`` (valuation of First Order Logic model)
          - ``text`` (the file contents as a unicode string)
          - ``raw`` (the raw file contents as a byte string)

        If no format is specified, ``load()`` will attempt to determine a
        format based on the resource name's file extension.  If that
        fails, ``load()`` will raise a ``ValueError`` exception.

        For all text formats (everything except ``pickle``, ``json``, ``yaml`` and ``raw``),
        it tries to decode the raw contents using UTF-8, and if that doesn't
        work, it tries with ISO-8859-1 (Latin-1), unless the ``encoding``
        is specified.

        :type resource_url: str
        :param resource_url: A URL specifying where the resource should be
            loaded from.  The default protocol is "nltk:", which searches
            for the file in the the NLTK data package.
        :type cache: bool
        :param cache: If true, add this resource to a cache.  If load()
            finds a resource in its cache, then it will return it from the
            cache rather than loading it.
        :type verbose: bool
        :param verbose: If true, print a message when loading a resource.
            Messages are not displayed when a resource is retrieved from
            the cache.
        :type logic_parser: LogicParser
        :param logic_parser: The parser that will be used to parse logical
            expressions.
        :type fstruct_reader: FeatStructReader
        :param fstruct_reader: The parser that will be used to parse the
            feature structure of an fcfg.
        :type encoding: str
        :param encoding: the encoding of the input; only used for text formats.
        """
        resource_url = normalize_resource_url(resource_url)
        resource_url = add_py3_data(resource_url)

        # Determine the format of the resource.
        if format == "auto":
            resource_url_parts = resource_url.split(".")
            ext = resource_url_parts[-1]
            if ext == "gz":
                ext = resource_url_parts[-2]
            format = AUTO_FORMATS.get(ext)
            if format is None:
                raise ValueError(
                    "Could not determine format for %s based "
                    'on its file\nextension; use the "format" '
                    "argument to specify the format explicitly." % resource_url
                )

        if format not in FORMATS:
            raise ValueError(f"Unknown format type: {format}!")

        # If we've cached the resource, then just return it.
        if cache:
            resource_val = _resource_cache.get((resource_url, format))
            if resource_val is not None:
                if verbose:
                    print(f"<<Using cached copy of {resource_url}>>")
                return resource_val

        # Let the user know what's going on.
        if verbose:
            print(f"<<Loading {resource_url}>>")

        # Load the resource.
>       opened_resource = _open(resource_url)

openadapt\.venv\lib\site-packages\nltk\data.py:750:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

resource_url = 'nltk:tokenizers/punkt/english.pickle'

    def _open(resource_url):
        """
        Helper function that returns an open file object for a resource,
        given its resource URL.  If the given resource URL uses the "nltk:"
        protocol, or uses no protocol, then use ``nltk.data.find`` to find
        its path, and open it with the given mode; if the resource URL
        uses the 'file' protocol, then open the file with the given mode;
        otherwise, delegate to ``urllib2.urlopen``.

        :type resource_url: str
        :param resource_url: A URL specifying where the resource should be
            loaded from.  The default protocol is "nltk:", which searches
            for the file in the the NLTK data package.
        """
        resource_url = normalize_resource_url(resource_url)
        protocol, path_ = split_resource_url(resource_url)

        if protocol is None or protocol.lower() == "nltk":
>           return find(path_, path + [""]).open()

openadapt\.venv\lib\site-packages\nltk\data.py:876:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

resource_name = 'tokenizers/punkt/english.pickle'
paths = ['C:\\Users\\jesic/nltk_data', 'C:\\Users\\jesic\\PycharmProjects\\PAT\\openadapt\\.venv\\nltk_data', 'C:\\Users\\jesi...rojects\\PAT\\openadapt\\.venv\\lib\\nltk_data', 'C:\\Users\\jesic\\AppData\\Roaming\\nltk_data', 'C:\\nltk_data', ...]

    def find(resource_name, paths=None):
        """
        Find the given resource by searching through the directories and
        zip files in paths, where a None or empty string specifies an absolute path.
        Returns a corresponding path name.  If the given resource is not
        found, raise a ``LookupError``, whose message gives a pointer to
        the installation instructions for the NLTK downloader.

        Zip File Handling:

          - If ``resource_name`` contains a component with a ``.zip``
            extension, then it is assumed to be a zipfile; and the
            remaining path components are used to look inside the zipfile.

          - If any element of ``nltk.data.path`` has a ``.zip`` extension,
            then it is assumed to be a zipfile.

          - If a given resource name that does not contain any zipfile
            component is not found initially, then ``find()`` will make a
            second attempt to find that resource, by replacing each
            component *p* in the path with *p.zip/p*.  For example, this
            allows ``find()`` to map the resource name
            ``corpora/chat80/cities.pl`` to a zip file path pointer to
            ``corpora/chat80.zip/chat80/cities.pl``.

          - When using ``find()`` to locate a directory contained in a
            zipfile, the resource name must end with the forward slash
            character.  Otherwise, ``find()`` will not locate the
            directory.

        :type resource_name: str or unicode
        :param resource_name: The name of the resource to search for.
            Resource names are posix-style relative path names, such as
            ``corpora/brown``.  Directory names will be
            automatically converted to a platform-appropriate path separator.
        :rtype: str
        """
        resource_name = normalize_resource_name(resource_name, True)

        # Resolve default paths at runtime in-case the user overrides
        # nltk.data.path
        if paths is None:
            paths = path

        # Check if the resource name includes a zipfile name
        m = re.match(r"(.*\.zip)/?(.*)$|", resource_name)
        zipfile, zipentry = m.groups()

        # Check each item in our path
        for path_ in paths:
            # Is the path item a zipfile?
            if path_ and (os.path.isfile(path_) and path_.endswith(".zip")):
                try:
                    return ZipFilePathPointer(path_, resource_name)
                except OSError:
                    # resource not in zipfile
                    continue

            # Is the path item a directory or is resource_name an absolute path?
            elif not path_ or os.path.isdir(path_):
                if zipfile is None:
                    p = os.path.join(path_, url2pathname(resource_name))
                    if os.path.exists(p):
                        if p.endswith(".gz"):
                            return GzipFileSystemPathPointer(p)
                        else:
                            return FileSystemPathPointer(p)
                else:
                    p = os.path.join(path_, url2pathname(zipfile))
                    if os.path.exists(p):
                        try:
                            return ZipFilePathPointer(p, zipentry)
                        except OSError:
                            # resource not in zipfile
                            continue

        # Fallback: if the path doesn't include a zip file, then try
        # again, assuming that one of the path components is inside a
        # zipfile of the same name.
        if zipfile is None:
            pieces = resource_name.split("/")
            for i in range(len(pieces)):
                modified_name = "/".join(pieces[:i] + [pieces[i] + ".zip"] + pieces[i:])
                try:
                    return find(modified_name, paths)
                except LookupError:
                    pass

        # Identify the package (i.e. the .zip file) to download.
        resource_zipname = resource_name.split("/")[1]
        if resource_zipname.endswith(".zip"):
            resource_zipname = resource_zipname.rpartition(".")[0]
        # Display a friendly error message if the resource wasn't found:
        msg = str(
            "Resource \33[93m{resource}\033[0m not found.\n"
            "Please use the NLTK Downloader to obtain the resource:\n\n"
            "\33[31m"  # To display red text in terminal.
            ">>> import nltk\n"
            ">>> nltk.download('{resource}')\n"
            "\033[0m"
        ).format(resource=resource_zipname)
        msg = textwrap_indent(msg)

        msg += "\n  For more information see: https://www.nltk.org/data.html\n"

        msg += "\n  Attempted to load \33[93m{resource_name}\033[0m\n".format(
            resource_name=resource_name
        )

        msg += "\n  Searched in:" + "".join("\n    - %r" % d for d in paths)
        sep = "*" * 70
        resource_not_found = f"\n{sep}\n{msg}\n{sep}\n"
>       raise LookupError(resource_not_found)
E       LookupError: 
E       **********************************************************************
E         Resource punkt not found.
E         Please use the NLTK Downloader to obtain the resource:
E       
E         >>> import nltk
E         >>> nltk.download('punkt')
E
E         For more information see: https://www.nltk.org/data.html
E       
E         Attempted to load tokenizers/punkt/english.pickle
E       
E         Searched in:
E           - 'C:\\Users\\jesic/nltk_data'
E           - 'C:\\Users\\jesic\\PycharmProjects\\PAT\\openadapt\\.venv\\nltk_data'
E           - 'C:\\Users\\jesic\\PycharmProjects\\PAT\\openadapt\\.venv\\share\\nltk_data'
E           - 'C:\\Users\\jesic\\PycharmProjects\\PAT\\openadapt\\.venv\\lib\\nltk_data'
E           - 'C:\\Users\\jesic\\AppData\\Roaming\\nltk_data'
E           - 'C:\\nltk_data'
E           - 'D:\\nltk_data'
E           - 'E:\\nltk_data'
E           - ''
E       **********************************************************************

openadapt\.venv\lib\site-packages\nltk\data.py:583: LookupError

During handling of the above exception, another exception occurred:

    def test_summary_empty():
        empty_text = ""
>       actual = REPLAY.get_summary(empty_text, 1)

tests\openadapt\test_summary.py:28:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
openadapt\strategies\mixins\summary.py:48: in get_summary
    parser = PlaintextParser.from_string(text, Tokenizer("english"))
openadapt\.venv\lib\site-packages\sumy\nlp\tokenizers.py:160: in __init__
    self._sentence_tokenizer = self._get_sentence_tokenizer(tokenizer_language)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <sumy.nlp.tokenizers.Tokenizer object at 0x00000172C4C39420>, language = 'english'

    def _get_sentence_tokenizer(self, language):
        if language in self.SPECIAL_SENTENCE_TOKENIZERS:
            return self.SPECIAL_SENTENCE_TOKENIZERS[language]
        try:
            path = to_string("tokenizers/punkt/%s.pickle") % to_string(language)
            return nltk.data.load(path)
        except (LookupError, zipfile.BadZipfile) as e:
>           raise LookupError(
                "NLTK tokenizers are missing or the language is not supported.\n"
                """Download them by following command: python -c "import nltk; nltk.download('punkt')"\n""" 
                "Original error was:\n" + str(e)
            )
E           LookupError: NLTK tokenizers are missing or the language is not supported.
E           Download them by following command: python -c "import nltk; nltk.download('punkt')"
E           Original error was:
E
E           **********************************************************************
E             Resource punkt not found.
E             Please use the NLTK Downloader to obtain the resource:
E
E             >>> import nltk
E             >>> nltk.download('punkt')
E
E             For more information see: https://www.nltk.org/data.html
E
E             Attempted to load tokenizers/punkt/english.pickle
E
E             Searched in:
E               - 'C:\\Users\\jesic/nltk_data'
E               - 'C:\\Users\\jesic\\PycharmProjects\\PAT\\openadapt\\.venv\\nltk_data'
E               - 'C:\\Users\\jesic\\PycharmProjects\\PAT\\openadapt\\.venv\\share\\nltk_data'
E               - 'C:\\Users\\jesic\\PycharmProjects\\PAT\\openadapt\\.venv\\lib\\nltk_data'
E               - 'C:\\Users\\jesic\\AppData\\Roaming\\nltk_data'
E               - 'C:\\nltk_data'
E               - 'D:\\nltk_data'
E               - 'E:\\nltk_data'
E               - ''
E           **********************************************************************

openadapt\.venv\lib\site-packages\sumy\nlp\tokenizers.py:174: LookupError
__________________________________________ test_summary_sentence __________________________________________ 

self = <sumy.nlp.tokenizers.Tokenizer object at 0x00000172C712FCA0>, language = 'english'

    def _get_sentence_tokenizer(self, language):
        if language in self.SPECIAL_SENTENCE_TOKENIZERS:
            return self.SPECIAL_SENTENCE_TOKENIZERS[language]
        try:
            path = to_string("tokenizers/punkt/%s.pickle") % to_string(language)
>           return nltk.data.load(path)

openadapt\.venv\lib\site-packages\sumy\nlp\tokenizers.py:172:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

resource_url = 'nltk:tokenizers/punkt/english.pickle', format = 'pickle', cache = True, verbose = False     
logic_parser = None, fstruct_reader = None, encoding = None

    def load(
        resource_url,
        format="auto",
        cache=True,
        verbose=False,
        logic_parser=None,
        fstruct_reader=None,
        encoding=None,
    ):
        """
        Load a given resource from the NLTK data package.  The following
        resource formats are currently supported:

          - ``pickle``
          - ``json``
          - ``yaml``
          - ``cfg`` (context free grammars)
          - ``pcfg`` (probabilistic CFGs)
          - ``fcfg`` (feature-based CFGs)
          - ``fol`` (formulas of First Order Logic)
          - ``logic`` (Logical formulas to be parsed by the given logic_parser)
          - ``val`` (valuation of First Order Logic model)
          - ``text`` (the file contents as a unicode string)
          - ``raw`` (the raw file contents as a byte string)

        If no format is specified, ``load()`` will attempt to determine a
        format based on the resource name's file extension.  If that
        fails, ``load()`` will raise a ``ValueError`` exception.

        For all text formats (everything except ``pickle``, ``json``, ``yaml`` and ``raw``),
        it tries to decode the raw contents using UTF-8, and if that doesn't
        work, it tries with ISO-8859-1 (Latin-1), unless the ``encoding``
        is specified.

        :type resource_url: str
        :param resource_url: A URL specifying where the resource should be
            loaded from.  The default protocol is "nltk:", which searches
            for the file in the the NLTK data package.
        :type cache: bool
        :param cache: If true, add this resource to a cache.  If load()
            finds a resource in its cache, then it will return it from the
            cache rather than loading it.
        :type verbose: bool
        :param verbose: If true, print a message when loading a resource.
            Messages are not displayed when a resource is retrieved from
            the cache.
        :type logic_parser: LogicParser
        :param logic_parser: The parser that will be used to parse logical
            expressions.
        :type fstruct_reader: FeatStructReader
        :param fstruct_reader: The parser that will be used to parse the
            feature structure of an fcfg.
        :type encoding: str
        :param encoding: the encoding of the input; only used for text formats.
        """
        resource_url = normalize_resource_url(resource_url)
        resource_url = add_py3_data(resource_url)

        # Determine the format of the resource.
        if format == "auto":
            resource_url_parts = resource_url.split(".")
            ext = resource_url_parts[-1]
            if ext == "gz":
                ext = resource_url_parts[-2]
            format = AUTO_FORMATS.get(ext)
            if format is None:
                raise ValueError(
                    "Could not determine format for %s based "
                    'on its file\nextension; use the "format" '
                    "argument to specify the format explicitly." % resource_url
                )

        if format not in FORMATS:
            raise ValueError(f"Unknown format type: {format}!")

        # If we've cached the resource, then just return it.
        if cache:
            resource_val = _resource_cache.get((resource_url, format))
            if resource_val is not None:
                if verbose:
                    print(f"<<Using cached copy of {resource_url}>>")
                return resource_val

        # Let the user know what's going on.
        if verbose:
            print(f"<<Loading {resource_url}>>")

        # Load the resource.
>       opened_resource = _open(resource_url)

openadapt\.venv\lib\site-packages\nltk\data.py:750:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

resource_url = 'nltk:tokenizers/punkt/english.pickle'

    def _open(resource_url):
        """
        Helper function that returns an open file object for a resource,
        given its resource URL.  If the given resource URL uses the "nltk:"
        protocol, or uses no protocol, then use ``nltk.data.find`` to find
        its path, and open it with the given mode; if the resource URL
        uses the 'file' protocol, then open the file with the given mode;
        otherwise, delegate to ``urllib2.urlopen``.

        :type resource_url: str
        :param resource_url: A URL specifying where the resource should be
            loaded from.  The default protocol is "nltk:", which searches
            for the file in the the NLTK data package.
        """
        resource_url = normalize_resource_url(resource_url)
        protocol, path_ = split_resource_url(resource_url)

        if protocol is None or protocol.lower() == "nltk":
>           return find(path_, path + [""]).open()

openadapt\.venv\lib\site-packages\nltk\data.py:876:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

resource_name = 'tokenizers/punkt/english.pickle'
paths = ['C:\\Users\\jesic/nltk_data', 'C:\\Users\\jesic\\PycharmProjects\\PAT\\openadapt\\.venv\\nltk_data', 'C:\\Users\\jesi...rojects\\PAT\\openadapt\\.venv\\lib\\nltk_data', 'C:\\Users\\jesic\\AppData\\Roaming\\nltk_data', 'C:\\nltk_data', ...]

    def find(resource_name, paths=None):
        """
        Find the given resource by searching through the directories and
        zip files in paths, where a None or empty string specifies an absolute path.
        Returns a corresponding path name.  If the given resource is not
        found, raise a ``LookupError``, whose message gives a pointer to
        the installation instructions for the NLTK downloader.

        Zip File Handling:

          - If ``resource_name`` contains a component with a ``.zip``
            extension, then it is assumed to be a zipfile; and the
            remaining path components are used to look inside the zipfile.

          - If any element of ``nltk.data.path`` has a ``.zip`` extension,
            then it is assumed to be a zipfile.

          - If a given resource name that does not contain any zipfile
            component is not found initially, then ``find()`` will make a
            second attempt to find that resource, by replacing each
            component *p* in the path with *p.zip/p*.  For example, this
            allows ``find()`` to map the resource name
            ``corpora/chat80/cities.pl`` to a zip file path pointer to
            ``corpora/chat80.zip/chat80/cities.pl``.

          - When using ``find()`` to locate a directory contained in a
            zipfile, the resource name must end with the forward slash
            character.  Otherwise, ``find()`` will not locate the
            directory.

        :type resource_name: str or unicode
        :param resource_name: The name of the resource to search for.
            Resource names are posix-style relative path names, such as
            ``corpora/brown``.  Directory names will be
            automatically converted to a platform-appropriate path separator.
        :rtype: str
        """
        resource_name = normalize_resource_name(resource_name, True)

        # Resolve default paths at runtime in-case the user overrides
        # nltk.data.path
        if paths is None:
            paths = path

        # Check if the resource name includes a zipfile name
        m = re.match(r"(.*\.zip)/?(.*)$|", resource_name)
        zipfile, zipentry = m.groups()

        # Check each item in our path
        for path_ in paths:
            # Is the path item a zipfile?
            if path_ and (os.path.isfile(path_) and path_.endswith(".zip")):
                try:
                    return ZipFilePathPointer(path_, resource_name)
                except OSError:
                    # resource not in zipfile
                    continue

            # Is the path item a directory or is resource_name an absolute path?
            elif not path_ or os.path.isdir(path_):
                if zipfile is None:
                    p = os.path.join(path_, url2pathname(resource_name))
                    if os.path.exists(p):
                        if p.endswith(".gz"):
                            return GzipFileSystemPathPointer(p)
                        else:
                            return FileSystemPathPointer(p)
                else:
                    p = os.path.join(path_, url2pathname(zipfile))
                    if os.path.exists(p):
                        try:
                            return ZipFilePathPointer(p, zipentry)
                        except OSError:
                            # resource not in zipfile
                            continue

        # Fallback: if the path doesn't include a zip file, then try
        # again, assuming that one of the path components is inside a
        # zipfile of the same name.
        if zipfile is None:
            pieces = resource_name.split("/")
            for i in range(len(pieces)):
                modified_name = "/".join(pieces[:i] + [pieces[i] + ".zip"] + pieces[i:])
                try:
                    return find(modified_name, paths)
                except LookupError:
                    pass

        # Identify the package (i.e. the .zip file) to download.
        resource_zipname = resource_name.split("/")[1]
        if resource_zipname.endswith(".zip"):
            resource_zipname = resource_zipname.rpartition(".")[0]
        # Display a friendly error message if the resource wasn't found:
        msg = str(
            "Resource \33[93m{resource}\033[0m not found.\n"
            "Please use the NLTK Downloader to obtain the resource:\n\n"
            "\33[31m"  # To display red text in terminal.
            ">>> import nltk\n"
            ">>> nltk.download('{resource}')\n"
            "\033[0m"
        ).format(resource=resource_zipname)
        msg = textwrap_indent(msg)

        msg += "\n  For more information see: https://www.nltk.org/data.html\n"

        msg += "\n  Attempted to load \33[93m{resource_name}\033[0m\n".format(
            resource_name=resource_name
        )

        msg += "\n  Searched in:" + "".join("\n    - %r" % d for d in paths)
        sep = "*" * 70
        resource_not_found = f"\n{sep}\n{msg}\n{sep}\n"
>       raise LookupError(resource_not_found)
E       LookupError: 
E       **********************************************************************
E         Resource punkt not found.
E         Please use the NLTK Downloader to obtain the resource:
E       
E         >>> import nltk
E         >>> nltk.download('punkt')
E
E         For more information see: https://www.nltk.org/data.html
E       
E         Attempted to load tokenizers/punkt/english.pickle
E       
E         Searched in:
E           - 'C:\\Users\\jesic/nltk_data'
E           - 'C:\\Users\\jesic\\PycharmProjects\\PAT\\openadapt\\.venv\\nltk_data'
E           - 'C:\\Users\\jesic\\PycharmProjects\\PAT\\openadapt\\.venv\\share\\nltk_data'
E           - 'C:\\Users\\jesic\\PycharmProjects\\PAT\\openadapt\\.venv\\lib\\nltk_data'
E           - 'C:\\Users\\jesic\\AppData\\Roaming\\nltk_data'
E           - 'C:\\nltk_data'
E           - 'D:\\nltk_data'
E           - 'E:\\nltk_data'
E           - ''
E       **********************************************************************

openadapt\.venv\lib\site-packages\nltk\data.py:583: LookupError

During handling of the above exception, another exception occurred:

    def test_summary_sentence():
        story = "However, this bottle was not marked “poison,” so Alice ventured to taste it, \
            and finding it very nice, (it had, in fact, a sort of mixed flavour of cherry-tart, \
            custard, pine-apple, roast turkey, toffee, and hot buttered toast,) \
            she very soon finished it off."
>       actual = REPLAY.get_summary(story, 1)

tests\openadapt\test_summary.py:37:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
openadapt\strategies\mixins\summary.py:48: in get_summary
    parser = PlaintextParser.from_string(text, Tokenizer("english"))
openadapt\.venv\lib\site-packages\sumy\nlp\tokenizers.py:160: in __init__
    self._sentence_tokenizer = self._get_sentence_tokenizer(tokenizer_language)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <sumy.nlp.tokenizers.Tokenizer object at 0x00000172C712FCA0>, language = 'english'

    def _get_sentence_tokenizer(self, language):
        if language in self.SPECIAL_SENTENCE_TOKENIZERS:
            return self.SPECIAL_SENTENCE_TOKENIZERS[language]
        try:
            path = to_string("tokenizers/punkt/%s.pickle") % to_string(language)
            return nltk.data.load(path)
        except (LookupError, zipfile.BadZipfile) as e:
>           raise LookupError(
                "NLTK tokenizers are missing or the language is not supported.\n"
                """Download them by following command: python -c "import nltk; nltk.download('punkt')"\n""" 
                "Original error was:\n" + str(e)
            )
E           LookupError: NLTK tokenizers are missing or the language is not supported.
E           Download them by following command: python -c "import nltk; nltk.download('punkt')"
E           Original error was:
E
E           **********************************************************************
E             Resource punkt not found.
E             Please use the NLTK Downloader to obtain the resource:
E
E             >>> import nltk
E             >>> nltk.download('punkt')
E
E             For more information see: https://www.nltk.org/data.html
E
E             Attempted to load tokenizers/punkt/english.pickle
E
E             Searched in:
E               - 'C:\\Users\\jesic/nltk_data'
E               - 'C:\\Users\\jesic\\PycharmProjects\\PAT\\openadapt\\.venv\\nltk_data'
E               - 'C:\\Users\\jesic\\PycharmProjects\\PAT\\openadapt\\.venv\\share\\nltk_data'
E               - 'C:\\Users\\jesic\\PycharmProjects\\PAT\\openadapt\\.venv\\lib\\nltk_data'
E               - 'C:\\Users\\jesic\\AppData\\Roaming\\nltk_data'
E               - 'C:\\nltk_data'
E               - 'D:\\nltk_data'
E               - 'E:\\nltk_data'
E               - ''
E           **********************************************************************

openadapt\.venv\lib\site-packages\sumy\nlp\tokenizers.py:174: LookupError
============================================ warnings summary ============================================= 
openadapt\.venv\lib\site-packages\fuzzywuzzy\fuzz.py:11
  C:\Users\jesic\PycharmProjects\PAT\openadapt\.venv\lib\site-packages\fuzzywuzzy\fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
    warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')

openadapt\.venv\lib\site-packages\onnxruntime\capi\_pybind_state.py:28
  C:\Users\jesic\PycharmProjects\PAT\openadapt\.venv\lib\site-packages\onnxruntime\capi\_pybind_state.py:28: DeprecationWarning: invalid escape sequence '\S'
    "(other than %SystemRoot%\System32), "

openadapt\.venv\lib\site-packages\pycountry\__init__.py:10
  C:\Users\jesic\PycharmProjects\PAT\openadapt\.venv\lib\site-packages\pycountry\__init__.py:10: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html 
    import pkg_resources

openadapt\.venv\lib\site-packages\pkg_resources\__init__.py:2871: 10 warnings
  C:\Users\jesic\PycharmProjects\PAT\openadapt\.venv\lib\site-packages\pkg_resources\__init__.py:2871: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages    
    declare_namespace(pkg)

openadapt\.venv\lib\site-packages\pkg_resources\__init__.py:2871
openadapt\.venv\lib\site-packages\pkg_resources\__init__.py:2871
openadapt\.venv\lib\site-packages\pkg_resources\__init__.py:2871
openadapt\.venv\lib\site-packages\pkg_resources\__init__.py:2871
openadapt\.venv\lib\site-packages\pkg_resources\__init__.py:2871
  C:\Users\jesic\PycharmProjects\PAT\openadapt\.venv\lib\site-packages\pkg_resources\__init__.py:2871: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google.cloud')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages    
    declare_namespace(pkg)

openadapt\.venv\lib\site-packages\pkg_resources\__init__.py:2350
openadapt\.venv\lib\site-packages\pkg_resources\__init__.py:2350
openadapt\.venv\lib\site-packages\pkg_resources\__init__.py:2350
  C:\Users\jesic\PycharmProjects\PAT\openadapt\.venv\lib\site-packages\pkg_resources\__init__.py:2350: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages    
    declare_namespace(parent)

openadapt\.venv\lib\site-packages\pkg_resources\__init__.py:2871
  C:\Users\jesic\PycharmProjects\PAT\openadapt\.venv\lib\site-packages\pkg_resources\__init__.py:2871: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google.logging')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages    
    declare_namespace(pkg)

openadapt\.venv\lib\site-packages\pkg_resources\__init__.py:2871
  C:\Users\jesic\PycharmProjects\PAT\openadapt\.venv\lib\site-packages\pkg_resources\__init__.py:2871: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google.iam')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages    
    declare_namespace(pkg)

openadapt\.venv\lib\site-packages\pkg_resources\__init__.py:2871
  C:\Users\jesic\PycharmProjects\PAT\openadapt\.venv\lib\site-packages\pkg_resources\__init__.py:2871: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('mpl_toolkits')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages    
    declare_namespace(pkg)

openadapt\.venv\lib\site-packages\pkg_resources\__init__.py:2871
  C:\Users\jesic\PycharmProjects\PAT\openadapt\.venv\lib\site-packages\pkg_resources\__init__.py:2871: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('ruamel')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages    
    declare_namespace(pkg)

openadapt\.venv\lib\site-packages\pkg_resources\__init__.py:2871
  C:\Users\jesic\PycharmProjects\PAT\openadapt\.venv\lib\site-packages\pkg_resources\__init__.py:2871: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('ruamel.yaml')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages    
    declare_namespace(pkg)

openadapt\.venv\lib\site-packages\pkg_resources\__init__.py:2350
  C:\Users\jesic\PycharmProjects\PAT\openadapt\.venv\lib\site-packages\pkg_resources\__init__.py:2350: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('ruamel')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages    
    declare_namespace(parent)

openadapt\.venv\lib\site-packages\pkg_resources\__init__.py:2871
openadapt\.venv\lib\site-packages\pkg_resources\__init__.py:2871
openadapt\.venv\lib\site-packages\pkg_resources\__init__.py:2871
openadapt\.venv\lib\site-packages\pkg_resources\__init__.py:2871
  C:\Users\jesic\PycharmProjects\PAT\openadapt\.venv\lib\site-packages\pkg_resources\__init__.py:2871: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages    
    declare_namespace(pkg)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================= short test summary info ========================================= 
FAILED tests/openadapt/test_scrub.py::test_scrub_image - pytesseract.pytesseract.TesseractNotFoundError: ...
FAILED tests/openadapt/test_summary.py::test_summary_empty - LookupError: NLTK tokenizers are missing or ...
FAILED tests/openadapt/test_summary.py::test_summary_sentence - LookupError: NLTK tokenizers are missing ...
========================== 3 failed, 22 passed, 31 warnings in 110.08s (0:01:50) ========================== `
KrishPatel13 commented 1 year ago

@jesicasusanto

I think you do not have TesseractOCR installed. So that is why the first test fails in test_scrub.

For Windows:

  1. Downlaod https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-w64-setup-5.3.1.20230401.exe and then follow the steps below:
  2. Add TesseractOCR to PATH: https://linuxhint.com/install-tesseract-windows/

For MacOS/Ubuntu/Other Os:

  1. https://tesseract-ocr.github.io/tessdoc/Installation.html and then follow the steps below:
  2. Search macOS (or equivalent) on the above page and follow the steps.