OSIRIS-Solutions / osiris

Open, Smart and Intuitive Research Information System
https://osiris-app.de
Other
9 stars 1 forks source link

404 Client Error in import_data.py and f-string syntax issue #33

Closed paschulke closed 2 days ago

paschulke commented 1 week ago

Issue Summary

When attempting to import data from OpenAlex using the import_data.py script in OSIRIS v1.3.6, the following errors occur:

  1. Error with HTTPError: 404:
    Using a custom config.ini file results in a 404 Client Error: NOT FOUND for certain issn values during the execution of openalex_parser.py. This error also occurs when using the default config.default.ini, but with different issn values.

  2. Syntax error in openalex_parser.py on line 156:
    A syntax error occurs due to incorrect usage of f-string formatting in the code, causing an unmatched [ in print(f'Activity type {work['type']} is unknown (DOI: {doi}).'). The error is resolved by replacing {work['type']} with {work["type"]}.

Error with custom config.ini:

(openAlex) $ python import_data.py
Traceback (most recent call last):
  File "/var/www/html/osiris/jobs/import_data.py", line 4, in <module>
    parser.importJob()
  File "/var/www/html/osiris/jobs/openalex_parser.py", line 291, in importJob
    for element in self.get_works():
  File "/var/www/html/osiris/jobs/openalex_parser.py", line 279, in get_works
    element = self.parseWork(work)
              ^^^^^^^^^^^^^^^^^^^^
  File "/var/www/html/osiris/jobs/openalex_parser.py", line 229, in parseWork
    journal = self.getJournal(loc['issn'])
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/www/html/osiris/jobs/openalex_parser.py", line 106, in getJournal
    source = self.openalex.get_single_venue(issn[-1], "issn")
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/venv/openAlex/lib/python3.11/site-packages/diophila/openalex.py", line 115, in get_single_venue
    return Venues(self._api_caller).get_single(id_value, id_type)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/venv/openAlex/lib/python3.11/site-packages/diophila/endpoints.py", line 77, in get_single
    return self.api_caller.get(single_path)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/venv/openAlex/lib/python3.11/site-packages/diophila/api_caller.py", line 38, in get
    response.raise_for_status()
  File "/home/user/venv/openAlex/lib/python3.11/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: NOT FOUND for url: https://api.openalex.org/venues/issn:1546-1718

Error with default config.default.ini:

(openAlex) $ python import_data.py
Traceback (most recent call last):
  File "/var/www/html/osiris/jobs/import_data.py", line 4, in <module>
    parser.importJob()
  File "/var/www/html/osiris/jobs/openalex_parser.py", line 291, in importJob
    for element in self.get_works():
  File "/var/www/html/osiris/jobs/openalex_parser.py", line 279, in get_works
    element = self.parseWork(work)
              ^^^^^^^^^^^^^^^^^^^^
  File "/var/www/html/osiris/jobs/openalex_parser.py", line 229, in parseWork
    journal = self.getJournal(loc['issn'])
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/www/html/osiris/jobs/openalex_parser.py", line 106, in getJournal
    source = self.openalex.get_single_venue(issn[-1], "issn")
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/venv/openAlex/lib/python3.11/site-packages/diophila/openalex.py", line 115, in get_single_venue
    return Venues(self._api_caller).get_single(id_value, id_type)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/venv/openAlex/lib/python3.11/site-packages/diophila/endpoints.py", line 77, in get_single
    return self.api_caller.get(single_path)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/venv/openAlex/lib/python3.11/site-packages/diophila/api_caller.py", line 38, in get
    response.raise_for_status()
  File "/home/user/venv/openAlex/lib/python3.11/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: NOT FOUND for url: https://api.openalex.org/venues/issn:1432-8798

Syntax error in openalex_parser.py (line 156):

Traceback (most recent call last):
  File "/var/www/html/osiris/jobs/import_data.py", line 1, in <module>
    from openalex_parser import OpenAlexParser
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/www/html/osiris/jobs/openalex_parser.py", line 156
    print(f'Activity type {work['type']} is unknown (DOI: {doi}).')
                                 ^^^^
SyntaxError: f-string: unmatched '['

Steps to Reproduce

  1. Use OSIRIS v1.3.6 and a pip environment with the dependencies listed below.
  2. Attempt to import data from OpenAlex with import_data.py using either custom config.ini or config.default.ini.
  3. Observe the 404 errors related to invalid issn values in the API requests.
  4. Additionally, observe the syntax error on line 156 of openalex_parser.py.

Affected Version

Pip Environment

(openAlex) $ python --version
Python 3.11.2

(openAlex) $ pip list
Package            Version
------------------ ---------
certifi            2024.8.30
charset-normalizer 3.4.0
diophila           0.4.0
dnspython          2.7.0
idna               3.10
Levenshtein        0.26.0
nameparser         1.1.3
pip                23.0.1
pymongo            4.10.1
RapidFuzz          3.10.0
requests           2.32.3
setuptools         66.1.1
urllib3            2.2.3

Workaround

For the syntax error on line 156 of openalex_parser.py, I replaced:

print(f'Activity type {work['type']} is unknown (DOI: {doi}).')

with:

print(f'Activity type {work["type"]} is unknown (DOI: {doi}).')
JKoblitz commented 1 week ago

Dear @paschulke, sorry for this.

For the 404 error, this (and another problem regarding user matching) should be already resolved in the latest dev branch. The f-string does not cause any problems in my Python version (3.12.1), which is why I never encountered this error. I fixed it for you (also in the dev branch).

I am currently working on a larger update so it will take me a while to merge into the master. Thank you for your patience.

Best, Julia