Open mkroetzsch opened 9 years ago
Marking this as a good first issue, with the understanding that this would only validate non-empty statement ids: empty statement ids would still be allowed to represent new statements (which are not saved in the target Wikibase yet). Making sure that even new statements have valid ids is what #447 proposes (so that's a different issue, probably a bit more involved).
Hi @wetneb @mkroetzsch, I tried finding how Wikibase validates statement IDs but couldn't find the exact logic/code for it. Can you please help with that? I'm a student and I hope to contribute to this issue. Thank you!
I think this is the file mentioned above: https://github.com/wmde/WikibaseLib/blob/bf563124cbab1ccc28367dcc5886fe23cac73bde/includes/ClaimGuidValidator.php This library has been refactored but I do not think the claim id format has changed since, so you should still be able to rely on it to understand the format.
Thank you for your help.
From my understanding, we are passing the statement id and subject (EntityIdValue) to the below function where the format of statement id is AFB6473A-362F-429C-B2D8-8 and subject has Id of format Q1294. The issue description asks to validate the whole string Q1294$AFB6473A-362F-429C-B2D8-8. After going through the PHP code, I have understood that the latter half of the string can be validated using regex and the first half is checked for its existence. Can you please suggest how I can check if the EntityIdValue exists or not in the wdtk as I couldn't find a function that does it? Also, please confirm if we have to validate the whole string and not just the latter half, i.e., statement id.
Thank you!
For this sort of validation we would not check that the id exists in the Wikibase instance, as this would require an HTTP query. You could just check that it is a valid entity id by parsing it (see #424) perhaps.
Hi, I have implemented the below logic. I wanted to discuss where this function should go.
There are two thoughts:
Also, I wanted to know your thoughts on what should happen when it is not valid. Is it good to generate an exception and log it while returning a null object to the function call? Please let me know.
Thank you!
I would put statement id validation in the statement constructor. When validation fails, an exception should be thrown (for instance IllegalArgumentException
). The standard output should not be used for this. Statement id validation should only be enforced when a non-null statement id is provided.
Perhaps it would be worth exposing statement id validation as a static method somewhere, so that it can also be used in tests, to validate the output of statement id generators.
Hi,
Thanks a lot for your help so far. For now, I declared and defined the validate function in StatementImpl and calling it within the constructor.
I wanted to confirm the format of the statement id once. We ran this query. and the statement id seems to be the format of 'Q5721-b763ede3-42b3-5ecb-ec0e-4bb85d4d348d'. Please notice there's no '$' sign in it. Although, according to the GuidGenerator, '$' is the separator. Can you confirm which is it actually?
Also, this change would require as to change all other test cases that pass a statement id since they are not following this particular format and all fail when this validate function is added. Right?
Thank you for your time and patience.
Yes, this is a deliberate difference between the format of statement ids in JSON and in RDF. I suspect this was introduced because having a $
sign in a URI is not very clean, perhaps.
You can see that in org.wikidata.wdtk.rdf.Vocabulary
where the statement ids are converted to RDF with PREFIX_WIKIDATA_STATEMENT + statementId.replaceFirst("\\$", "-")
.
Also, this change would require as to change all other test cases that pass a statement id since they are not following this particular format and all fail when this validate function is added. Right?
If these tests start failing, then yes, we will need to change the ids they provide so that they match the expected format.
Statement ids in Wikibase have a fixed format, e.g., Q1294$AFB6473A-362F-429C-B2D8-8. This is validated in Wikibase when making API calls, and Wikidata Toolkit should likewise implement a basic validation in its data model. The PHP validation code can be seen at http://wbdoc.wmflabs.org/de/df9/ClaimGuidValidator_8php_source.html