Open nfdykes opened 2 years ago
actually almost all of spheroscope is written in a way that queries can be executed on any CWB-indexed corpus. However, there are a couple of issues:
queries.py::run_cmd
hardcodes "tweet_id" for diffingqueries.py::add_gold
hardcodes "tweet_id" for adding info about TPs / FPsqueries.py::query_command
hardcodes "tweet_id" for summarizationremote_db.py::update_gold
hardcodes "tweet" to post-process database entriesThe only problem here is (1) which probably can be fixed by using s_show[0]
from the corpus settings
Can this be fixed simply by renaming that column to id
?
not really, since this has to be aligned with the structural attribute indexed in the CWB (which is e.g. text id="<id>"
or tweet id="<id>"
or article id="<id>"
)… we could disallow usage of copora without e.g. text
s-atts, but I would opt for a solution with a mapping because I've always hated the fact that CQPweb needs a text_id
s-att and simply doesn't work if your texts are not named text
…
Currently, spheroscope seems to be hard-coded for tweets, i.e. if I run a query on a different corpus, I can't display the results because I get a key error for tweet_id. This is probably something that should be changed in the mid run so that the tool can be applied to other corpora in RAND