egerber / spaCy-entity-linker

spaCy module for linking text to Wikidata items
MIT License
215 stars 32 forks source link

Refactor of installation #17

Closed dennlinger closed 1 year ago

dennlinger commented 1 year ago

Another thing that bothered me enough to open a PR is the lack of clear installation instructions (or rather, the lack of automating it).

  1. I've moved the installation instructions in the README up a bit, so it becomes more visible for first-time visitors; feel free to discuss whether this is a necessary/good change, but IMO it helps with clarity.
  2. Running __main__.py with an incorrect argument now gives an appropriate error message, instead of just "quietly finishing."
  3. Instead of having to manually download the KB first, I introduce a new catch in the DatabaseConnection: Errors that are likely caused by the lack of a KB (and thus failing to connect to it) will trigger the download of the database, and then re-try the connection. This way, even without manually running the installation process, the download will be triggered on the first time users run the library. My only concern is that sqlite3.OperationalErrors can be caused by a variety of other issues, which might make this a bad practice to download a KB each time...
  4. Finally, there are some minor nit-picks about general Python-related issues, e.g., mutable default arguments (e.g., lists).

Thanks again for the great library, hope this helps give it a bit more usability!

MartinoMensio commented 1 year ago

Hi @dennlinger, This is a great PR! I am reviewing it!

MartinoMensio commented 1 year ago

I think that we can simply check if the database file exists (DB_DEFAULT_PATH) and download otherwise. In this way we don't trigger another download if something else happens with sqlite3.OperationalError. I'm testing it.

Martino

MartinoMensio commented 1 year ago

Thank you again @dennlinger for the very good contribution! Merging now!

Martino