Closed wetneb closed 3 years ago
https://phabricator.wikimedia.org/T202725 and https://phabricator.wikimedia.org/T199896 will make the implementation a bit cumbersome. Fixing these limitations properly in WikibaseLexeme is probably going to be as hard as writing a circumvention in WikidataToolkit
Interesting!
For the first one (T202725) this is related to #376: we should not assume that wbeditentity
returns the full entity as this is only the case for Items apparently (not even properties).
For the second one I guess this means we need to implement other API actions, which are not going to be atomic indeed. I think we need to rethink the architecture of this module - I wanted to do that for a long time (#403) but haven't got round to it yet. This is related to my second bullet point in that ticket.
Indeed, having support for other API actions would be great. But I believe these limitation should also be removed on the WikibaseLexeme side.
Also related: T249206 - Serialized statements of Forms and Senses are missing data type fields
datatype
field for each Snak was missing previously for statements within Senses and Forms of a Lexeme.
Starting on 26 August 2020 the datatype
fields will be present on them. [Reference: Wikidata Project Chat]
hi team! sorry for a noob question, but.. how does one create a lexeme (with forms) using WDTK? I see that "datamodel" artifact has quite extensive support for Lexemes, but could not find anything in the WikibaseDataEditor. Is it because of this issue? if so, what would be the suggested workaround? Thanks in advance.
Hi @62mkv, that's correct: editing and creating lexemes is not supported yet in WDTK.
thanks, @wetneb! I am looking into it now, seeing what is possible as a quick hack to be able to just a) create lexemes with forms or b) add forms to existing lexeme. Currently the stumbling point for me is an apparent inability to create a FormDocument
as a new Form (with "null" id) for a given Lexeme.
@Tpt I see you've added those types, what would be your suggestion on how to resolve it?
PS: there seem to be tools that are capable of what I need, in particular LexData (although they seem to be using different action, "wbladdform") and https://github.com/lucaswerkmeister/tool-lexeme-forms/, but I'd really not want to abandon Java for this... TIA!
Hi @62mkv!
To create a new Form
with the WDTK datamodel, you can use the LexemeDocument.createForm
method.
This will properly generate a new form identifier and add the form to the lexeme object.
Then, we need to implement lexemes/forms and senses saving. The wbeditentity
API action we use for forms and senses is a bit limited (c.f. the upper discussions). If you are familiar with PHP, the easiest way to go is probably to just fix the MediaWiki WikibaseLexeme extension. If not, maybe some hacks with the existing API actions wbaddform... might do the job.
thanks @Tpt ! I guess that would cover my use-case №1 (create and add forms) but how do I get LexemeDocument
, if I have an L-id already?
I see that I can use WbGetEntitiesAction
to get an EntityDocument
but how to obtain a proper LexemeDocument
out of that?
This will properly generate a new form identifier and add the form to the lexeme object.
by the way, javadoc on that method says
/**
* Creates a new {@link FormDocument} for this lexeme.
* The form is not added to the {@link LexemeDocument} object,
* it should be done with {@link LexemeDocument#withForm}.
*/
I see that I can use WbGetEntitiesAction to get an EntityDocument but how to obtain a proper LexemeDocument out of that?
You could just cast using the usual (LexemeDocument)
.
by the way, javadoc on that method says
Indeed, my bad.
Cool! and by the way, if I try to call LexemeDocument.createForm
on a not-yet added lexeme, it throws an exception
java.lang.IllegalArgumentException: The string L0-F1 is not a valid form id
at org.wikidata.wdtk.datamodel.implementation.FormIdValueImpl.<init>(FormIdValueImpl.java:65)
so, it seems like there's no easy way to create lexeme AND add forms in a single wbeditaction
hop... I'll try with WbGet
now
so, with this code:
LexemeDocument existingLexeme = (LexemeDocument) wikibaseDataFetcher.getEntityDocument("L1358");
FormDocument formDocument = existingLexeme.createForm(
Collections.singletonList(Datamodel.makeMonolingualTextValue("aprils", LANGUAGE_CODE)),
Collections.singletonList(getItemIdForTestWikidata("Q42"))
);
LexemeDocument withForm = existingLexeme.withForm(formDocument);
LexemeDocument result = wikibaseDataEditor
.createLexemeDocument(withForm, "Adding form to existing lexeme", null);
i am getting this request string:
summary=Adding form to existing lexeme&new=lexeme&maxlag=5&data={"type":"lexeme","id":"L1358","lexicalCategory":"Q212131","language":"Q208912","lemmas":{"en":{"language":"en","value":"april"}},"claims":{},"forms":[{"id":"L1358-F1","representations":{"en":{"language":"en","value":"aprils"}},"grammaticalFeatures":["Q42"],"claims":{},"lastrevid":533196,"type":"form"}],"senses":[],"lastrevid":533196}&bot=&assert=user&format=json&action=wbeditentity&token
and this MediaWikiException:
org.wikidata.wdtk.wikibaseapi.apierrors.MediaWikiApiErrorException: [param-invalid] Invalid field used in call: "id", must match id parameter
is it problem with my code, the WDTK unreadiness, or Wikidata API problem? I can't tell :( to me, request content looks legit. it correctly shows lemma, lexeme id, form with features..
UPD: aha, so, looking at the documentation for wbeditaction
, (https://www.wikidata.org/w/api.php?action=help&modules=wbeditentity) it seems as though id
parameter is missing. will look as to why that might happen
dang, and if I mess with WbDataEditor
to edit and not create lexemes, when new form is given as above, this is what I get from MediaWiki API:
org.wikidata.wdtk.wikibaseapi.apierrors.MediaWikiApiErrorException: [modification-failed] Lexeme does not have Form with given ID
so apparently you can't add forms with wbeditentity
, dammit...
so apparently you can't add forms with wbeditentity, dammit...
Yes, sadly. The Wikibase API for form and sense editing is currently in an unfinished state.
yep. I've just tried to hack on FormDocument
yet again, so that payload for wbeditentity
looked like this:
{"type":"lexeme","id":"L1358","lexicalCategory":"Q212131","language":"Q208912","lemmas":{"en":{"language":"en","value":"april"}},"claims":{},"forms":[{"representations":{"en":{"language":"en","value":"aprils"}},"grammaticalFeatures":["Q42"],"claims":{},"lastrevid":533196,"type":"form"}],"senses":[],"lastrevid":533196}
and MediaWiki even gives "OK"-ish response:
{"entity":{"claims":{},"id":"L1358","type":"lexeme","lastrevid":533196,"nochange":""},"success":1}
but still, nothing seems to be added to WD Lexeme at all. In fact, I can't even find any traces of this request execution on "test.wikidata.org" at all.. is it yet another bug of Wikibase API? .. meh
PS: does "nochange": ""
in the response indicate that wiki-engine considered my request a no-op and that might explain why am I not seeing any logs of it?
Hooooooy! I've managed to both create lexeme with forms and to add forms to existing lexeme. The key was this nugget: https://github.com/nyurik/lexicator/blob/master/lexicator/lexemer/LexemeParserState.py#L182 (thanks to @nyurik for help)!
the code is super-ugly but at least I should be able to progress with this.
Hello. What's the status of lexeme editing? I have a private lexeme editing library that is in some ways more capable than WDTK and in others less capable. I am at the crossroads choosing between major upgrade to my private library or switching to WDTK and upgrading it with a series of smaller pull requests.
WDTK will mostly work for me. I have only encountered following issues:
I can send pull requests for the first two issues, but the third one is a deal-breaker. Why is lexeme editing in a branch for so long? Is it seriously broken? When is it going to be merged? Why wasn't it merged already?
The other thing I am thinking about is the editing API. #403 is overkill for my use case. Ideally, I would prefer to just have mutable entities and have an API that computes diff from original entity and modified one and then writes the diff. But at the moment the whole model is immutable. Bare diff API is nevertheless good enough, although the updateStatements
method is begging for a builder class. That can be done with a PR too.
Hi @robertvazan, thanks for offering to contribute on this!
Personally, I was not aware of the lexeme-editing
branch at all. If this branch has been useful to you and you don't see any big issue about it, then you could open a pull request for it, potentially adding any further changes you have made on your side. I think it would be very welcome and I would be keen to review it.
Let's also ping the author @Tpt.
@wetneb I haven't started using WDTK yet. Can I just ignore the branch then and submit PRs to master?
If you did not use this branch yourself, then yes it's fine to submit PRs based on master. But it could be worth waiting a bit for @Tpt to understand why this branch was left unmerged.
I have not merged this branch because it is still buggy.
Indeed a few features are still missing in WikibaseLexeme to be able to use the wbeditentity
API just like we do on items and properties:
Feel free to ignore my branch or take the relevant bits from it and integrate your own code.
@Tpt WDTK can just implement Wikibase API to the extent it is implemented in Wikibase itself. Known unsupported request features can be detected and terminated with exception before they hit network. Incomplete responses can be either mapped to incomplete WDTK objects or an additional read requests can be made. This can be all documented. This way WDTK can expose available APIs to the maximum extent possible.
@robertvazan That would be great! If you could implement it, it would be amazing!
Just FYI: I have tested wbeditentity on test Wikidata and most of the lexeme can be edited. The only exception is sense statements. Senses themselves (addition/removal) and their glosses are editable though. Editing of forms and senses works both directly via form/sense ID and via lexeme except for the mentioned sense statements. The returned JSON is indeed incomplete. It is only useful to obtain lexeme ID.
There are some inconsistencies in editing various parts of the lexeme. The following procedures were tested to work.
Lemma
Language and lexical category
Lexeme statements
Qualifiers and references These cannot be edited on their own. They are part of the statement. Modifying the statement without repeating qualifiers and references will delete them.
Forms
Form representations Like lemmas.
Grammatical features
Form statements Like lexeme statements, just nested under form in JSON.
Senses Like forms.
Glosses Like lemmas.
Sense statements Not supported. All edits are ignored.
Hello there, I just wanted to let you know that we fixed the issue that was preventing to edit Senses and statements from wbeditentity (T199896) which we hope will help tool maintainers to support Lexemes. We would of course love to see Wikidata Toolkit supporting Lexemes as it would be helpful to increase and diversify the tools base to edit Lexemes :)
If you have questions, issues or requests, feel free to contact me (not on this account as it's my personal one, rather at lea.lacroix@wikimedia.de) Thanks!
We now have support for Lexeme entities in the datamodel. We could also support editing these in
wdtk-wikibaseapi
.