Open lukavdplas opened 4 months ago
Yes, this is indeed still a to do on which I got stuck: I have a branch somewhere that applies the new mapping style (with language suffix) for all corpora, but realized that we can't deploy this unless we reindex all corpora first. I did not know the best solution for this at the time, and then forgot to flag this problem.
What we could do:
languages
array will get the new mapping styleThe second option will be harder to understand for outside developers, I think, but so will be the language suffix for (the majority of) corpora which aren't multilingual.
Ah, I see. I don't think it's high-priority right now, but perhaps we can add a comment in the corpus definitions?
Do you think that choice would have an effect on #992 ?
No, I don't think so, as the analyzers are defined per corpus. The different language analyzers won't affect the query syntax, as far as I can foresee. Visualizations, however, may be affected by this. Will have to look at this again and will comment on the issue if I spot some problems.
Hm, actually, I would prefer it if this were fixed sooner rather than later. I actually do index them quite regularly on my local machine for testing. They're now in a weird state where the code does not work but is still supposed to be maintained.
What went wrong?
Not sure if I did something wrong here. I tried indexing
parliament-sweden-old
locally and got this error:I fixed it by changing the definition of the
speech
field:https://github.com/UUDigitalHumanitieslab/I-analyzer/blob/d040118c4cdc477044011bf649e326a83c017315/backend/corpora/parliament/sweden-old.py#L88-L89
To:
So I think this corpus is just missing its own definition for the mapping (and language) of the speech field? This seems to be true for other parliament corpora too.
What did you expect to happen?
The index operation should run without exceptions.
Screenshot
No response
Where did you find the bug?
Version
develop (~5.4.0)
Steps to reproduce
parliament-sweden-old
corpus. Add the corpus definition toCORPORA
and add any string value forPP_SWEDEN_OLD_DATA
.yarn django index parliament-sweden-old