Generalisation of spheroscope to corpora with other text types

ausgerechnet / spheroscope

web app for argumentation mining

GNU General Public License v3.0

7 stars 1 forks source link

Generalisation of spheroscope to corpora with other text types #60

Open nfdykes opened 2 years ago

nfdykes commented 2 years ago

Currently, spheroscope seems to be hard-coded for tweets, i.e. if I run a query on a different corpus, I can't display the results because I get a key error for tweet_id. This is probably something that should be changed in the mid run so that the tool can be applied to other corpora in RAND

ausgerechnet commented 2 years ago

actually almost all of spheroscope is written in a way that queries can be executed on any CWB-indexed corpus. However, there are a couple of issues:

queries.py::run_cmd hardcodes "tweet_id" for diffing
queries.py::add_gold hardcodes "tweet_id" for adding info about TPs / FPs
queries.py::query_command hardcodes "tweet_id" for summarization
remote_db.py::update_gold hardcodes "tweet" to post-process database entries

The only problem here is (1) which probably can be fixed by using s_show[0] from the corpus settings

mgttlinger commented 2 years ago

Can this be fixed simply by renaming that column to id?

ausgerechnet commented 2 years ago

not really, since this has to be aligned with the structural attribute indexed in the CWB (which is e.g. text id="<id>" or tweet id="<id>" or article id="<id>")… we could disallow usage of copora without e.g. text s-atts, but I would opt for a solution with a mapping because I've always hated the fact that CQPweb needs a text_id s-att and simply doesn't work if your texts are not named text …