clamsproject / aapb-evaluations

Collection of evaluation codebases
Apache License 2.0
0 stars 1 forks source link

goldretriever.download_golds() fails #57

Open marcverhagen opened 4 days ago

marcverhagen commented 4 days ago

Bug Description

When running timeframe-eval/evaluate.py for chyrons the following snippets of code are executed:

GOLD_CHYRON_URL = "https://github.com/clamsproject/aapb-annotations/tree/cc0d58e16a06a8f10de5fc0e5333081c107d5937/newshour-chyron/golds"
goldretriever.download_golds(GOLD_CHYRON_URL)

The gold standard URL exists and leads to an html page. The problem is that the code in the goldretriever module seems to assume that the request returns a JSON object:

https://github.com/clamsproject/aapb-evaluations/blob/bd88a3755d784d1fb527957dcb63810071a76f40/goldretriever.py#L26-L30

I may make an obvious mistake since it seems that this should have worked for others.

Reproduction steps

Install requirements for timeframe evaluation:

cd timeframe-eval
pip install -r requirements.txt

Run the script:

python evaluate.py --mmif-dir preds\@swt\@3.1\@aapb-collaboration-7/ --chyron

You will be treated to this error

Traceback (most recent call last):
  File "/Users/marc/Desktop/projects/clams/code/clamsproject/aapb-evaluations/timeframe-eval/evaluate.py", line 196, in <module>
    ref_dir = goldretriever.download_golds(GOLD_CHYRON_URL)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marc/Desktop/projects/clams/code/clamsproject/aapb-evaluations/timeframe-eval/goldretriever.py", line 35, in download_golds
    payload = json.loads(response.text)['payload']
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.6/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.6/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.6/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 7 column 1 (char 6)

Expected behavior

It should not break

Log output

Traceback (most recent call last):
  File "/Users/marc/Desktop/projects/clams/code/clamsproject/aapb-evaluations/timeframe-eval/evaluate.py", line 196, in <module>
    ref_dir = goldretriever.download_golds(GOLD_CHYRON_URL)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marc/Desktop/projects/clams/code/clamsproject/aapb-evaluations/timeframe-eval/goldretriever.py", line 35, in download_golds
    payload = json.loads(response.text)['payload']
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.6/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.6/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.6/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 7 column 1 (char 6)


### Screenshots

_No response_

### Additional context

_No response_
marcverhagen commented 4 days ago

It may be safer to use the API:

curl https://api.github.com/repos/clamsproject/aapb-annotations/git/trees/cc0d58e16a06a8f10de5fc0e5333081c107d5937
keighrim commented 4 days ago

This has been fixed in the pypi version of the retriever https://github.com/clamsproject/clams-utils/commit/af58aaa4ac12caab41e249badb82c85b730a0fe9 released in https://pypi.org/project/clams-utils/240626/

Could you try importing the pypi version?

marcverhagen commented 4 days ago

The error remains:

$ pip install clams-utils==240626
$ python evaluate.py --mmif-dir preds\@swt\@3.1\@aapb-collaboration-7/ --chyron
Traceback (most recent call last):
  File "/Users/marc/Desktop/projects/clams/code/clamsproject/aapb-evaluations/timeframe-eval/evaluate.py", line 196, in <module>
    ref_dir = goldretriever.download_golds(GOLD_CHYRON_URL)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marc/Desktop/projects/clams/code/clamsproject/aapb-evaluations/timeframe-eval/goldretriever.py", line 46, in download_golds
    payload = json.loads(response.text)['payload']
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.6/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.6/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.6/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 7 column 1 (char 6)

While doing this I did realize that it was silly to use the --chyron option with the swt predictions so I changed the command

python evaluate.py --mmif-dir preds\@chyron-detection\@batch2/ --chyron

But the error remains.

I also replicated the above in the main branch (I failed to mention that all the above was on the 43-swt-eval branch), still the same error.

keighrim commented 4 days ago

Look likes it's still the local retriever

  File "/Users/marc/Desktop/projects/clams/code/clamsproject/aapb-evaluations/timeframe-eval/goldretriever.py", line 46, in download_golds
    payload = json.loads(response.text)['payload']
              ^^^^^^^^^^^^^^^^^^^^^^^^^

Probably

from clams_utils.aapb import goldretriever

will change the library to use?

marcverhagen commented 4 days ago

Duh! Yes, that got rid of the error.

What I really do not get is why the current code ever worked.