CentreForDigitalHumanities / edpop-explorer

Common interface to multiple library catalogues and bibliographical databases
BSD 3-Clause "New" or "Revised" License
2 stars 2 forks source link

Automatically download databases of new readers #44

Closed tijmenbaarda closed 1 month ago

tijmenbaarda commented 1 month ago

This PR generalizes the way the USTC and FBTEE readers were downloading and using the required database files by outfactoring this functionality and applying it to Dutch Almanacs, KVCS and Pierre Belle too. The databases are downloaded from the dhstatic server.

This PR also includes an update to the README file concerning the installation of database files, as well as a note about the "performing query" message in the Explorer CLI.

Update: also refer to a specific version of the EDPOP record ontology, close #25

linguistcrg commented 1 month ago

I have been testing the readers after this PR, and there are some issues:

tijmenbaarda commented 1 month ago

Thanks a lot for this @linguistcrg !

I have been testing the readers after this PR, and there are some issues:

* Neither the FBTEE nor the USTC readers work, which was expected. The README says that you should give me access to the USTC database. Could you do that, and for the FBTEE as well?

Concerning USTC, that is expected indeed, but FBTEE should be downloaded automatically. That is working on my installation. Could you give me the traceback (use set debug true in the shell)?

* Searches in the BNF are quite slow now.

I'm afraid there is nothing we can do about this :(

* STCN still gives results, but I get the following error when trying to access the info of an entry:
  `EXCEPTION of type 'AttributeError' occurred with message: type object 'STCNReader' has no attribute '_convert_record'`

Hmm, this is working in my installation. It is true that STCNReader has no method _convert_record anymore (only one without the underscore), but it should not be called anyway. Could you give the full traceback?

tijmenbaarda commented 1 month ago

Thanks a lot for this @linguistcrg !

I have been testing the readers after this PR, and there are some issues:

* Neither the FBTEE nor the USTC readers work, which was expected. The README says that you should give me access to the USTC database. Could you do that, and for the FBTEE as well?

Concerning USTC, that is expected indeed, but FBTEE should be downloaded automatically. That is working on my installation. Could you give me the traceback (use set debug true in the shell)?

I have sent you the other database through SURFfilesender.

* Searches in the BNF are quite slow now.

I'm afraid there is nothing we can do about this :(

* STCN still gives results, but I get the following error when trying to access the info of an entry:
  `EXCEPTION of type 'AttributeError' occurred with message: type object 'STCNReader' has no attribute '_convert_record'`

Hmm, this is working in my installation. It is true that STCNReader has no method _convert_record anymore (only one without the underscore), but it should not be called anyway. Could you give the full traceback?

linguistcrg commented 4 weeks ago

This is the full traceback for FBTEE:

[edpop-explorer] # fbtee almanac
Performing query: SQLPreparedQuery(where_statement='WHERE full_book_title LIKE ?', arguments=['%almanac%'])
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/cmd2/cmd2.py", line 2399, in onecmd_plus_hooks
    stop = self.onecmd(statement, add_to_history=add_to_history)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/cmd2/cmd2.py", line 2852, in onecmd
    stop = func(statement)
           ^^^^^^^^^^^^^^^
  File "/Users/cristinaregueragomez/Documents/Code/CDH/edpop-explorer/edpop_explorer/edpopxshell.py", line 191, in do_fbtee
    self._query(FBTEEReader, args)
  File "/Users/cristinaregueragomez/Documents/Code/CDH/edpop-explorer/edpop_explorer/edpopxshell.py", line 255, in _query
    self.reader.fetch()
  File "/Users/cristinaregueragomez/Documents/Code/CDH/edpop-explorer/edpop_explorer/reader.py", line 125, in fetch
    resulting_range = self.fetch_range(range(self._fetch_position,
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/cristinaregueragomez/Documents/Code/CDH/edpop-explorer/edpop_explorer/readers/fbtee.py", line 143, in fetch_range
    self._add_fields(record)
  File "/Users/cristinaregueragomez/Documents/Code/CDH/edpop-explorer/edpop_explorer/readers/fbtee.py", line 77, in _add_fields
    assert isinstance(record.data, dict)
                      ^^^^^^^^^^^
AttributeError: 'int' object has no attribute 'data'
EXCEPTION of type 'AttributeError' occurred with message: 'int' object has no attribute 'data'

And this is the one for STCN:

[edpop-explorer] # show 1
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/cmd2/cmd2.py", line 2399, in onecmd_plus_hooks
    stop = self.onecmd(statement, add_to_history=add_to_history)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/cmd2/cmd2.py", line 2852, in onecmd
    stop = func(statement)
           ^^^^^^^^^^^^^^^
  File "/Users/cristinaregueragomez/Documents/Code/CDH/edpop-explorer/edpop_explorer/edpopxshell.py", line 89, in do_show
    self.show_record(record)
  File "/Users/cristinaregueragomez/Documents/Code/CDH/edpop-explorer/edpop_explorer/edpopxshell.py", line 92, in show_record
    record.fetch()  # Necessary in case this is a lazy record
    ^^^^^^^^^^^^^^
  File "/Users/cristinaregueragomez/Documents/Code/CDH/edpop-explorer/edpop_explorer/sparqlreader.py", line 94, in fetch
    self.from_reader._convert_record(self.original_graph, self)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: type object 'STCNReader' has no attribute '_convert_record'. Did you mean: 'convert_record'?
EXCEPTION of type 'AttributeError' occurred with message: type object 'STCNReader' has no attribute '_convert_record'
tijmenbaarda commented 4 weeks ago

Hmm, there seems to be an issue with your installation. The code fragments in the stack trace are of an older version of EDPOP Explorer, that then is interacting with a newer version. I don't really understand what is going on, but can you verify that the version in /Users/cristinaregueragomez/Documents/Code/CDH/edpop-explorer/ is of the correct branch (i.e., the latest develop, because this branch has already been merged now)?

linguistcrg commented 3 weeks ago

It's solved now. I was running the incorrect version of edpopx because the pip package was pointing to my own forked repo, instead of this one.

I just tested FBTEE and STCN, and they are working well. What should I focus on now?

tijmenbaarda commented 2 weeks ago

Great! You can close the respective issues on GitHub. There are some new additions on the development branch that you can test. The search screen is now rendered as a table. The language field is now normalized (all languages should be shown uniformally in English -- before they were just shown as (different sets of) languages codes). Also, with the new version of the Explorer, all databases should be working now, including the databases you added. You could test these and also close the issues you had opened that are solved now. Please make sure you run pip install -r requirements.txt and npm install to update the requirements.

linguistcrg commented 5 days ago

Hello! I have been doing some testing after pulling the new changes and updating the requirements, and all the databases are taking considerably longer to show the results from the query. On a different note, none of the databases are being displayed correctly on the VRE:

jgonggrijp commented 5 days ago

@linguistcrg Are you facing those issues on the server or locally on your PC? In the latter case, does it help to pass the --build option to docker compose up?

linguistcrg commented 1 day ago

Thank you, that worked! I just finished my final testing tasks, and closed the issues that were solved, and opened new issues. Some last comments: