ArgLab / AWE_Workbench

This repository contains the web service API for calling AWE Workbench to support automated writing evaluation, plus examples of use.
Other
0 stars 0 forks source link

Test Cleanup #1

Closed DrLynch closed 2 months ago

DrLynch commented 9 months ago

This is primarily a placeholder issue for discussion. At present the Testing code is wholly unorganized. To the extent it exists most of it has been placed inside the AWE_Workbench repository. The code will need to be sorted, assigned to specific subpackages, and extended to cover areas not otherwise evaluated. This will also require developing some bespoke tests for the AWE_LanguageTool which must be wholly unique.

This is a local placeholder for the ETS level issue: https://github.com/ETS-Next-Gen/AWE_Workbench/issues/2

In addressing this you will need to:

  1. Survey the tests to see where they should be located and make a catalog of the items to share.
  2. Move tests to the appropriate location.
  3. Develop novel tests for the uncovered items and set up a single suite.
duckduckdoof commented 9 months ago

First goal is to get tests in Workbench/tests working. As of right now, pytest does not properly load extensions specified by components in custom spacy pipeline:

I have verified that pytest (python=3.10.13, pytest-8.0.0, pluggy-1.4.0) does not run these tests properly. I have also been asked to verify if the tests run in a non-pytest setting (i.e. - run the holmes wrapper spacy pipeline interactively): this appears to not work either:

import holmes_extractor.manager as holmes
import unittest
from awe_components.components.utility_functions import print_parse_tree
from awe_workbench.pipeline import pipeline_def

holmes_manager = holmes.Manager(
    'en_core_web_lg', perform_coreference_resolution=False, number_of_workers=2, extra_components=pipeline_def)

holmes_manager.parse_and_register_document(...)

extended_test = doc._.extended_core_sentences
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/duck/anaconda3/envs/ets_test/lib/python3.10/site-packages/spacy/tokens/underscore.py", line 47, in __getattr__
    raise AttributeError(Errors.E046.format(name=name))
AttributeError: [E046] Can't retrieve unregistered extension attribute 'extended_core_sentences'. Did you forget to call the `set_extension` method?

I also note that, with tests which do not have this error, there is a mismatch in the expected dictionary and the returned dictionary from holmes manager:

>>> len(doc._.vwp_perspective_spans)
705
>>> len(persp_actual)
4
duckduckdoof commented 9 months ago

I looked deeper into the called components in the holmes spacy pipeline, to see if I could find the extensions referred to in the tests in Workbench/tests. As of now, there are three components which have custom added extensions:

After searching the extensions name by name, I compared them to extensions called in the workbench tests. Of these components, we are missing custom extensions (ex: extended_core_sentences, as mentioned in https://github.com/ArgLab/AWE_Workbench/issues/1#issuecomment-1925908865).

However, there are components called in the tests which are in fact created here; I'm not sure at this point where the other extensions could have disappeared from.

PDDeane commented 9 months ago

It looks like I renamed extended_core_sentences ... see here: (https://github.com/ETS-Next-Gen/AWE_Components/blob/main/awe_components/components/contentSegmentation.py,) line 26 ... I remember doing a restructuring on vwp_perspective_spans that I may not have updated the test for.

For the other attribute issues, there's a check you need to run.

This script is designed to pretty much run ALL of the extended attributes I create to produce a document summary:

https://github.com/ETS-Next-Gen/AWE_Workbench/blob/main/examples/batch_summary.py

It gives you an example of how to run all of the code OUTSIDE the testing environment. If you can run that without error, the attributes exist under normal circumstances, and something weird is happening in the testing environment.

duckduckdoof commented 9 months ago

@PDDeane I ran batch_summary.py, but got an error; appears the dataframe in question is empty:

processing files in ./
Running LanguageTool
Running spellcorrect
Running parser
Empty DataFrame
Columns: []
Index: []
Traceback (most recent call last):
  File "/mnt/c/Users/duckd/Documents/dev/ets_dev/AWE_Workbench/examples/batch_summary.py", line 117, in <module>
    syntactic_profile.set_index('ID', inplace=True)
  File "/home/duck/anaconda3/envs/ets_test_2/lib/python3.10/site-packages/pandas/core/frame.py", line 6106, in set_index
    raise KeyError(f"None of {missing} are in the columns")
KeyError: "None of ['ID'] are in the columns"

Both servers were running before I ran this one. I also went ahead and printed the contents of "syntactic_profile".

PDDeane commented 9 months ago

Clearly something is broken here. But note that I ran this batch script last July and did not encounter this problem.

I also ran this script without problems:

https://github.com/ETS-Next-Gen/AWE_Workbench/blob/main/examples/standalone_parse.py

If this doesn't work, something is broken that was not broken in the main repository in July 2023 ...

PDDeane commented 9 months ago

I will try this weekend to pull down the main branch from ETS Next Gen and see if the standalone script runs. Clearly something is broken. The question is where the break happened and what exactly broke ... :(

duckduckdoof commented 9 months ago

Is there any way I can see your environment that you're running? That would help me tons, to see if I can compare it with mine.

PDDeane commented 9 months ago

Remind me what the command is to display the environment? The machine I normally use is at home, so I'll need to run that command tonight to see what happens.

On Wed, Feb 7, 2024 at 4:50 PM Caleb Scott @.***> wrote:

Is there any way I can see your environment that you're running? That would help me tons, to see if I can compare it with mine.

— Reply to this email directly, view it on GitHub https://github.com/ArgLab/AWE_Workbench/issues/1#issuecomment-1932996208, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHGKSEZ2U5C5SX5EEQD4GBDYSPZJLAVCNFSM6AAAAABCM4SBQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZSHE4TMMRQHA . You are receiving this because you were mentioned.Message ID: @.***>

PDDeane commented 9 months ago

(and of course I need to verify that the bare parse script runs before that would do you much good)

On Wed, Feb 7, 2024 at 4:55 PM Paul Deane @.***> wrote:

Remind me what the command is to display the environment? The machine I normally use is at home, so I'll need to run that command tonight to see what happens.

On Wed, Feb 7, 2024 at 4:50 PM Caleb Scott @.***> wrote:

Is there any way I can see your environment that you're running? That would help me tons, to see if I can compare it with mine.

— Reply to this email directly, view it on GitHub https://github.com/ArgLab/AWE_Workbench/issues/1#issuecomment-1932996208, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHGKSEZ2U5C5SX5EEQD4GBDYSPZJLAVCNFSM6AAAAABCM4SBQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZSHE4TMMRQHA . You are receiving this because you were mentioned.Message ID: @.***>

duckduckdoof commented 9 months ago

With pip, you can run "pip freeze" and pipe that into a file; I'd also need the python version too.

Alternatively, you can install and run "pipdeptree" to see a tree of dependencies.

duckduckdoof commented 9 months ago

After looking at all the "missing" parts of the test, here are the attributes claimed to be missing:

I found this code blurb from contentSegmentation.py (starting at line 28):

doc._.main_ideas_ = core_sentences
doc._.supporting_ideas_ = extended_core_sentences
doc._.supporting_details_ = elaboration_sentences
doc._.prompt_related_ = pclusters
doc._.prompt_language_ = plemmas

For the first two, maybe the tests should be calling "main_ideas" and "supporting_ideas", respectfully? Not sure about the last three though. These errors are suggestive that maybe this isn't a spacy version error.

PDDeane commented 9 months ago

The first four reflect changes to viewpointFeatures.py.

Looks like I renamed core_sentences and extended_core_sentences.

The propn_ attributes were probably removed because when I added a new function that allows the same indicator to be summarized in multiple ways, they became redundant.

vwp_direct_speech_spans is set in viewpointFeatures.py, It looks like I set up the function that defines it so that it only get called if you first call for vwp_attribution, vwp_cite, vwp_source, nominal_references, tense_changes, concrete_details. I recall wanting to call this function only if I absolutely had to, because it's very expensive. So I ran it when one of the attributes dependent on it was called, and not otherwise.

So you should rename the attributes in the core_sentences and extended_core_sentences tests, delete the two propn tests, and (probably best choice here), call nominal_references before you test vwp_direct_speech_spans.

Paul

On Wed, Feb 7, 2024 at 4:59 PM Caleb Scott @.***> wrote:

After looking at all the "missing" parts of the test, here are the attributes claimed to be missing:

  • core_sentences
  • extended_core_sentences
  • propn_allocentric
  • propn_egocentric
  • vwp_direct_speech_spans

I found this code blurb from contentSegmentation.py:

doc._.mainideas = coresentences doc..supportingideas = extended_coresentences doc..supportingdetails = elaborationsentences doc..promptrelated = pclusters doc._.promptlanguage = plemmas

For the first two, maybe the tests should be calling "main_ideas" and "supporting_ideas", respectfully? Not sure about the last three though.

— Reply to this email directly, view it on GitHub https://github.com/ArgLab/AWE_Workbench/issues/1#issuecomment-1933007236, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHGKSE6UV3IKQDKULIW2KODYSP2LHAVCNFSM6AAAAABCM4SBQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZTGAYDOMRTGY . You are receiving this because you were mentioned.Message ID: @.***>

duckduckdoof commented 9 months ago

Did you manage to get a description of your python environment dependencies?

duckduckdoof commented 9 months ago

I also managed to find a "vwp_direct_speech" attribute in the viewpointFeatures.py file; should I use that instead of vwp_direct_speech_spans?

And, if this is the case, do I still need to call nominal_references? I didn't find that specific attribute, but rather one called "nominalReferences".

PDDeane commented 9 months ago

Attached is my environment that I was able to get it successfully running in.

However, in doing that I discovered that I had accidentally not committed a set of bug fixes I did last summer, and that my public key had in the meantime timed out. So since getting my github setup fixed may take a while I'm attaching what I consider the "bug-fixed" version of each package in the attached zip file. What I suggest you do is pull out the packages in a separate directory, get rid of some accidental things like .git files that were included when I zipped them up, and try uninstalling your version of the packages and installing my versions. If they work for you after replicating my environment, you can then start moving my changes into the master after we do some testing, maybe a comparison to Collin's fork ...

Some specific things to note:

direct_speech_spans should be listed in the list of valid extensions in utility_function.py ... that's one of the problems I encountered.

Version problems: after sklearn 1.3, you get errors from using the affinity keyword in lexicalClusters.py when AgglomerativeClustering is called. That's not currently in requirements, and is technical debt needing to be fixed.

I also had to downgrade protobuf for the code to work.

I also ran across a couple of minor bugs that I fixed, but those are probably best lumped in with the other bugs I found and fixed this summer. At this point, you should be able, post install, to run AWE_Workbench/examples/standalone_parse.py without errors. We should run some of the other example code, starting with running the servers and batch processing, and the (updated) tests, and see where we stand. commits.tgz https://drive.google.com/file/d/1dAffmxhZXb3BmCMeqsZo1_Ad335YkKv0/view?usp=drive_web

On Thu, Feb 8, 2024 at 5:24 PM Caleb Scott @.***> wrote:

I also managed to find a "vwp_direct_speech" attribute in the viewpointFeatures.py file; should I use that instead of vwp_direct_speech_spans?

And, if this is the case, do I still need to call nominal_references? I didn't find that specific attribute, but rather one called "nominalReferences".

— Reply to this email directly, view it on GitHub https://github.com/ArgLab/AWE_Workbench/issues/1#issuecomment-1935028964, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHGKSEZL47WF72NOHAY3NQ3YSVGBVAVCNFSM6AAAAABCM4SBQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZVGAZDQOJWGQ . You are receiving this because you were mentioned.Message ID: @.***>

packages in environment at /home/paul/miniconda3/envs/awe_workbench:

#

Name Version Build Channel

_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
bzip2 1.0.8 h7b6447c_0
ca-certificates 2023.12.12 h06a4308_0
expat 2.5.0 h6a678d5_0
ld_impl_linux-64 2.38 h1181459_1
libffi 3.4.4 h6a678d5_0
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libstdcxx-ng 11.2.0 h1234567_1
libuuid 1.41.5 h5eee18b_0
ncurses 6.4 h6a678d5_0
openssl 3.0.13 h7f8727e_0
pip 23.3.1 py312h06a4308_0
python 3.12.1 h996f2a0_0
readline 8.2 h5eee18b_0
setuptools 68.2.2 py312h06a4308_0
sqlite 3.41.2 h5eee18b_0
tk 8.6.12 h1ccaba5_0
tzdata 2023d h04d1e81_0
wheel 0.41.2 py312h06a4308_0
xz 5.4.5 h5eee18b_0
zlib 1.2.13 h5eee18b_0

PDDeane commented 9 months ago

Also, let me know when you've downloaded it so I can clear my google drive ...

On Thu, Feb 8, 2024 at 8:20 PM Paul Deane @.***> wrote:

Attached is my environment that I was able to get it successfully running in.

However, in doing that I discovered that I had accidentally not committed a set of bug fixes I did last summer, and that my public key had in the meantime timed out. So since getting my github setup fixed may take a while I'm attaching what I consider the "bug-fixed" version of each package in the attached zip file. What I suggest you do is pull out the packages in a separate directory, get rid of some accidental things like .git files that were included when I zipped them up, and try uninstalling your version of the packages and installing my versions. If they work for you after replicating my environment, you can then start moving my changes into the master after we do some testing, maybe a comparison to Collin's fork ...

Some specific things to note:

direct_speech_spans should be listed in the list of valid extensions in utility_function.py ... that's one of the problems I encountered.

Version problems: after sklearn 1.3, you get errors from using the affinity keyword in lexicalClusters.py when AgglomerativeClustering is called. That's not currently in requirements, and is technical debt needing to be fixed.

I also had to downgrade protobuf for the code to work.

I also ran across a couple of minor bugs that I fixed, but those are probably best lumped in with the other bugs I found and fixed this summer. At this point, you should be able, post install, to run AWE_Workbench/examples/standalone_parse.py without errors. We should run some of the other example code, starting with running the servers and batch processing, and the (updated) tests, and see where we stand. commits.tgz https://drive.google.com/file/d/1dAffmxhZXb3BmCMeqsZo1_Ad335YkKv0/view?usp=drive_web

On Thu, Feb 8, 2024 at 5:24 PM Caleb Scott @.***> wrote:

I also managed to find a "vwp_direct_speech" attribute in the viewpointFeatures.py file; should I use that instead of vwp_direct_speech_spans?

And, if this is the case, do I still need to call nominal_references? I didn't find that specific attribute, but rather one called "nominalReferences".

— Reply to this email directly, view it on GitHub https://github.com/ArgLab/AWE_Workbench/issues/1#issuecomment-1935028964, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHGKSEZL47WF72NOHAY3NQ3YSVGBVAVCNFSM6AAAAABCM4SBQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZVGAZDQOJWGQ . You are receiving this because you were mentioned.Message ID: @.***>

duckduckdoof commented 5 months ago

Hey @PDDeane, I've been working on re-writing the tests again, and I've tried to map all document features in all tests to proper AWE_Info objects. Listed are some features with hard to understand naming conventions; I'd like some correction/clarification on these.

The below features are from all the tests in the tests directory of AWE_Workbench.

duckduckdoof commented 5 months ago

Additionally, after searching through awe_nlp.py, parserServer.py, and the tests in AWE_Workbench, I could not find (close or) appropriate AWE_Info objects for the following features:

duckduckdoof commented 5 months ago

Also, what is the importance of the (devword, True) tuples? I see them for almost every other AWE_Info object.

Finally, with regards to the tests, are there certain features you think that we can test, and ignore the rest? Or do all features need to be tested individually? Dr Lynch mentioned that some features may rely on others, so testing for some could cover dependent features.

I did notice that, for the features in awe_nlp.py for writingobserver, there are features referenced that do not exist in the current tests. How should we approach writing new ground truth data for those? I currently have a new testing file made in AWE_Workbench for the features used in awe_nlp.py.

PDDeane commented 5 months ago

Unfortunately I am on vacation in Portugal and Spain until June 8th -- won't be able to handle the questions till the week of the 10th

On Wed, May 29, 2024, 2:12 PM Caleb Scott @.***> wrote:

Also, what is the importance of the (devword, True) tuples? I see them for almost every other AWE_Info object.

Finally, with regards to the tests, are there certain features you think that we can test, and ignore the rest? Or do all features need to be tested individually? Dr Lynch mentioned that some features may rely on others, so testing for some could cover dependent features.

I did notice that, for the features in awe_nlp.py for writingobserver, there are features referenced that do not exist in the current tests. How should we approach writing new ground truth data for those?

— Reply to this email directly, view it on GitHub https://github.com/ArgLab/AWE_Workbench/issues/1#issuecomment-2137383005, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHGKSE2IZATC3AQQLLV2J3LZEXH2PAVCNFSM6AAAAABCM4SBQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZXGM4DGMBQGU . You are receiving this because you were mentioned.Message ID: @.***>

duckduckdoof commented 2 months ago

As of current, our tests properly capture all needed document/token features as described in writingobserver's awe_nlp.py.