Closed stevevanhooser closed 1 year ago
I pushed a new version of did.database.m to the SQL branch, which implements schema validation (lines 428 and 909-1235), with the following caveats:
IGNORE_DID_CLASS_PREFIX=false
in did.database.m line 983.queryable
field is currently ignored. This will be added at a later stage.Thanks Yair
I'm checking these changes, I get an error when I run
did.test.test_did_db_documents
It looks like did.database.all_doc_ids
needs to take a branch as an argument and use the current branch if none is given.
I'll keep testing when this is fixed.
Thanks Steve
The full error trail is
Error in did.implementations.sqlitedb/do_get_doc_ids (line 230)
' AND branch_id="' branch_id '"'];
Error in did.database/all_doc_ids (line 350)
doc_ids = database_obj.do_get_doc_ids();
Error in did.database/validate_docs (line 911)
all_ids = database_obj.all_doc_ids();
Error in did.database/add_docs (line 428)
database_obj.validate_docs(document_objs);
Error in did.test.test_did_db_documents (line 29)
db.add_docs(docs);
I forgot to commit sqlitedb.m, it should be good now.
You still get an error when you run did.test.test_did_db_documents
, but this error is expected (because you forgot to define the document_version field in the base schema):
>> did.test.test_did_db_documents
Validating demoA doc 41268cb359de9b63_c0b7910d4270c0ef
Validating demoA doc 41268cb359de9b63_c0b7910d4270c0ef (superclass base)
Error using did.database/validate_doc_vs_schema
Dissimilar sub-fields defined/found for base field in demoA doc 41268cb359de9b63_c0b7910d4270c0ef (superclass base)
("datestamp,document_version,id,name,session_id" <=> "datestamp,id,name,session_id")
Error in did.database/validate_doc_vs_schema (line 1043)
database_obj.validate_doc_vs_schema(docProps, schemaStruct2, all_ids);
Error in did.database/validate_docs (line 942)
database_obj.validate_doc_vs_schema(docProps, schemaStruct, all_ids);
Error in did.database/add_docs (line 428)
database_obj.validate_docs(document_objs);
Error in did.test.test_did_db_documents (line 29)
db.add_docs(docs);
Note that _all_docids returns all the doc_ids defined in the DB, not just the docs in the current/specified branch.
Thanks Yair
I noticed that the new validation schema were not yet merged into sql.
Before I merged, I got the following error. I don't get it after merging but I think it means that it's possible that validation
in the function might not be defined under some scenarios.
Before I merged, with did.test.test_did_db_documents
I get:
Error in did.database/get_document_schema (line 961)
error('DID:Database:ValidationFileMissing','Validation file "%s" not found',validation);
Error in did.database/validate_docs (line 939)
schemaStruct = database_obj.get_document_schema(schema_filename);
Error in did.database/add_docs (line 428)
database_obj.validate_docs(document_objs);
Error in did.test.test_did_db_documents (line 29)
db.add_docs(docs);
After merging, I'm getting no errors on our existing tests, which is great!
We will write some advanced testing over the next few days to try to intentionally make some errors and make sure that they have good error messages.
Thanks Steve
Hi Avi -
I pulled the new changes that Yair made into the branch valtest
I see a new test function
did.test.test_did_db_documents_invalid
But it only seems to test for one way for the documents to fail by messing up many fields in one document. It doesn't test many ways the documents might fail separately, like the did.test.test_did_db_queries
method. Can you make it like that, where there are many modes that might be tested, and we can test that each one fails?
Right now, we can't be sure that the integer detection algorithm is the only part of the validator that is working. Does that make sense?
Thanks Steve
@stevevanhooser Hi Steve,
I am unsure about what output to expect from some of the invalidation tests.
When I remove the fields document_properties.document_class
, document_properties.document_class.definition
, document_properties.document_class.validation
, document_properties.document_class.property_list_name
, and document_properties.document_class.class_version
, it does not recognize the documents as invalid.
However, when I remove the field document_properties.document_class.class_name
, the subsequent documents are invalidated.
This feels somewhat inconsistent - which of these outcomes (or both, or neither) is correct?
% no error (passes validation):
[b,msg] = did.test.test_did_db_documents_invalid('remover','document_class');
[b,msg] = did.test.test_did_db_documents_invalid('remover','definition');
[b,msg] = did.test.test_did_db_documents_invalid('remover','validation');
[b,msg] = did.test.test_did_db_documents_invalid('remover','property_list_name');
[b,msg] = did.test.test_did_db_documents_invalid('remover','class_version');
% produces error (does not pass validation):
[b,msg] = did.test.test_did_db_documents_invalid('remover','class_name');
Thanks, Avi
Hi Avi -
This is a good list. Do all the other fields act as expected, like removing the base field (should cause an error) or removing the ID of a base field or leaving it empty, etc?
Thanks Steve
So far I have 29 tests and other than the ones I pointed out, they acted as I expected (they were caught as invalid):
% result of all tests:
[test_results,test_messages] = run_test_invalid();
Removing the base field is invalidated, as expected:
% this test passes - the modified document(s) is(are) invalidated:
[b1,msg1] = did.test.test_did_db_documents_invalid('remover','base');
Removing the ID of a base field is invalidated, as expected:
% this test passes - the modified document(s) is(are) invalidated:
[b2,msg2] = did.test.test_did_db_documents_invalid('remover','id');
Leaving the ID empty is invalidated, as expected:
% these tests pass - the modified documents are invalidated:
[b3,msg3] = did.test.test_did_db_documents_invalid('value_modifier','blank int');
[b4,msg4] = did.test.test_did_db_documents_invalid('value_modifier','blank str');
Cool. Do you think we have a complete list to send now (the tests mentioned above?)
I think it's a complete list of all possible field removals - however I think there are still some field modifications that I haven't implemented yet
Hi Yair -
The validation seems to work perfectly except for a couple of cases.
My tech Avi writes: "When I remove the fields document_properties.document_class
, document_properties.document_class.definition
, document_properties.document_class.validation
, document_properties.document_class.property_list_name
, and document_properties.document_class.class_version
, it does not recognize the documents as invalid.
However, when I remove the field document_properties.document_class.class_name, the subsequent documents are invalidated.
% no error (passes validation) but should be an error:
[b,msg] = did.test.test_did_db_documents_invalid('remover','document_class');
[b,msg] = did.test.test_did_db_documents_invalid('remover','definition');
[b,msg] = did.test.test_did_db_documents_invalid('remover','validation');
[b,msg] = did.test.test_did_db_documents_invalid('remover','property_list_name');
[b,msg] = did.test.test_did_db_documents_invalid('remover','class_version');
% produces error (does not pass validation):
[b,msg] = did.test.test_did_db_documents_invalid('remover','class_name');
document_class
is a standard structure that exists in all documents, so it should be checked always, it is outside of what is specified in the schema. Does that make sense?
Thanks Steve
I think that you misinterpreted the error message and cause. This is what I get on my end:
>> [b,msg] = did.test.test_did_db_documents_invalid('remover','document_class');
Validating demoB doc 41268d0332bcba80_c0c70ff7315c9b43
Validating demoB doc 41268d0332bcba80_c0c70ff7315c9b43 (superclass base)
Validating demoB doc 41268d0332bcba80_c0c70ff7315c9b43 (superclass demoA)
Invalid non-numeric sub-field demoB.value found in demoB doc 41268d0332bcba80_c0c70ff7315c9b43
Error due to one of the following modifiers:
value_modifier:'sham'
id_modifier:'sham'
datestamp_modifier:'sham'
session_id_modifier:'sham'
dependency_modifier:'sham'
remover:'document_class'
The important line is:
Invalid non-numeric sub-field demoB.value found in demoB
This error is due to the fact that in did.test.documents.make_doc_tree_invalid.m
line 99 you set a string value into the field demoB.value, which is defined in the schema as "integer":
docs{end+1} = did.document('demoB','demoB.value',int2str(counter),...
The Invalid non-numeric sub-field
error is the only one that is reported by database.m
- The rest of the messages below it are programmatic (in `did.test.test_did_db_documents_invalid.m lines 46-53) and are misleading because they have no relation to any removed fields.
Hi Yair, thanks for pointing this out - it seems I didn't update the remote repo before posting my issue.
Could you please try again with this updated list:
% no error (documents pass validation) despite being modified:
[b,msg] = did.test.test_did_db_documents_invalid('remover','document_class');
[b,msg] = did.test.test_did_db_documents_invalid('remover','definition');
[b,msg] = did.test.test_did_db_documents_invalid('remover','validation');
[b,msg] = did.test.test_did_db_documents_invalid('remover','property_list_name');
[b,msg] = did.test.test_did_db_documents_invalid('remover','class_version');
[b,msg] = did.test.test_did_db_documents_invalid('other_modifier','invalid definition'); %replaces the definition field in document_class with a nonsense string
[b,msg] = did.test.test_did_db_documents_invalid('other_modifier','invalid property list name'); %replaces property_list_name with a nonsense string
[b,msg] = did.test.test_did_db_documents_invalid('other_modifier','new class version number'); %sets class_version to 2 instead of 1
[b,msg] = did.test.test_did_db_documents_invalid('other_modifier','class version string'); %sets class_version to a nonsense string
[b,msg] = did.test.test_did_db_documents_invalid('other_modifier','invalid base name'); %sets name field in base class to a nonsense string
Hi Avi - same question here, which branch is this code on? Thanks
This should be the branch "valtest"
This is now merged into sql
.
Yair, the errors are
% no error (documents pass validation) despite being modified:
[b,msg] = did.test.test_did_db_documents_invalid('remover','document_class');
[b,msg] = did.test.test_did_db_documents_invalid('remover','definition');
[b,msg] = did.test.test_did_db_documents_invalid('remover','validation');
[b,msg] = did.test.test_did_db_documents_invalid('remover','property_list_name');
[b,msg] = did.test.test_did_db_documents_invalid('remover','class_version');
[b,msg] = did.test.test_did_db_documents_invalid('other_modifier','invalid definition'); %replaces the definition field in document_class with a nonsense string
[b,msg] = did.test.test_did_db_documents_invalid('other_modifier','invalid property list name'); %replaces property_list_name with a nonsense string
[b,msg] = did.test.test_did_db_documents_invalid('other_modifier','new class version number'); %sets class_version to 2 instead of 1
[b,msg] = did.test.test_did_db_documents_invalid('other_modifier','class version string'); %sets class_version to a nonsense string
[b,msg] = did.test.test_did_db_documents_invalid('other_modifier','invalid base name'); %sets name field in base class to a nonsense string
Thanks Steve and Avi
The fact that you remove or modify certain document fields does not (by itself) make the document invalid. It depends on the document's schema - if the schema does not require the deleted field etc. there is no validation problem.
So I don't understand what the problem is. You need to be more verbose in what you ask me, because I am not there next to you and I cannot read your minds [at least yet...]. For example, you need to say something like:
"in file abcd.json we define rule XYZ and yet although in file efgh.m line 123 we remove this field from document Doc_123, the document passes validation when we add it to the database."
Without such verbose explanation you force me to go on a fishing expedition to try to figure out what you mean and what the problem is (if there is in fact a problem). This is both very inefficient and quite frustrating.
Hi Yair -
I'm sorry it's frustrating. I think I see the issue.
The document_class
field in the document JSON code is not explicitly in the schema, but it is part of every document and is part of an implied schema.
Let's follow an example. Let's look at an example document type, demoA. Here is the schema:
DID-matlab/example_schema/demo_schema1/database_schema/demoA.schema.json
{
"classname": "demoA",
"superclasses": [ "base" ],
"depends_on": [ ],
"file": [ ],
"demoA": [
{
"name": "value",
"type": "integer",
"default_value": 1,
"parameters": [-100000,100000,0],
"queryable": 1,
"documentation": "A value field for the demoA class."
}
]
}
And here is an example document:
{
"document_class": {
"superclasses": {"definition": "$DIDDOCUMENT_EX1/base.json"},
"class_version": 1,
"definition": "$DIDDOCUMENT_EX1/demoA.json",
"property_list_name": "demoA",
"class_name": "demoA",
"validation": "$DIDSCHEMA_EX1/demoA.schema.json"
},
"demoA": {"value": 1},
"base": {
"datestamp": "2023-03-05T17:25:30.102Z",
"name": "",
"session_id": "",
"id": "41268d0d73bbd6f9_c0ddad560858299e"
}
}
Now suppose someone wants to pass off the following JSON code as a valid document, let's call it X
:
{
"document_class": {
"superclasses": {"definition": "$DIDDOCUMENT_EX1/base.json"},
"class_version": 1,
"definition": "$DIDDOCUMENT_EX1/demoA.json",
},
"demoA": {"value": 1},
"base": {
"datestamp": "2023-03-05T17:25:30.102Z",
"name": "",
"session_id": "",
"id": "41268d0d73bbd6f9_c0ddad560858299e"
}
}
X is not a valid document. It's missing the fields document_class.property_name_list
and document_class.class_name
with proper values that are built when the document demoA
is built. But, document_class
is not specified on a field by field basis in the schema, but rather its values depend on what occurs when a properly constructed demoA
is built. If one made an example document with d = did.document('demoA');
, it would have a properly constructed document_class
field.
It's a bit challenging to check this. Probably the easiest thing is to actually build a blank document and just check that it matches.
Does this make sense? document_class
has meta requirements, which is why it is confusing.
Best Steve
The latest committed version in the sql
branch now reports an error for:
[b,msg] = did.test.test_did_db_documents_invalid('remover','document_class');
[b,msg] = did.test.test_did_db_documents_invalid('remover','property_list_name');
[b,msg] = did.test.test_did_db_documents_invalid('remover','class_version');
...and also when removing the document's document_class.class_name
field (or setting it to empty)
However, no error is reported for these (the reason for each is stated in a comment):
[b,msg] = did.test.test_did_db_documents_invalid('remover','validation');
%A document with no defined validation file is still valid
%(however, an error is reported if a file is specified but not found)
[b,msg] = did.test.test_did_db_documents_invalid('remover','definition');
%A document with no definition is still valid
%The document_class.definition field is not used anywhere in the code
[b,msg] = did.test.test_did_db_documents_invalid('other_modifier','invalid definition');
%A document with a bad definition is still valid
%The document_class.definition field is not used anywhere in the code
[b,msg] = did.test.test_did_db_documents_invalid('other_modifier','invalid property list name');
%I have no idea how a valid value is defined, so we only check if the field is missing or empty.
%The document_class.property_list_name field is not used anywhere in the code
[b,msg] = did.test.test_did_db_documents_invalid('other_modifier','new class version number');
%I have no idea how a valid value is defined, so we only check if the field is missing or empty
%The document_class.class_version field is not used anywhere in the code
[b,msg] = did.test.test_did_db_documents_invalid('other_modifier','class version string');
%I have no idea how a valid value is defined, so we only check if the field is missing or empty
%The document_class.class_version field is not used anywhere in the code
[b,msg] = did.test.test_did_db_documents_invalid('other_modifier','invalid base name');
%I have no idea how a valid value is defined, so the document_properties.base.name field is not currently checked
%The document_properties.base.name field is not used anywhere in the code
Let me know if/how you wish me to process each of these.
Hi Yair and Avi -
Right now we have no more tests to put forward, things are working well as far as we can tell! Thanks
Steve
(We will discuss today, see https://github.com/VH-Lab/DID-matlab/blob/validation/docs/development/did-schema4.md)