apache / daffodil-vscode

Apache Daffodil™ Extension for Visual Studio Code
https://daffodil.apache.org/
Apache License 2.0
11 stars 20 forks source link

Need a way to have the run do daffodil's 'limited validation' #269

Closed scholarsmate closed 1 year ago

scholarsmate commented 2 years ago

Need a way to have the run do daffodil's 'limited validation'. Basically to see the see the stream of validation errors as they are produced. This is just another option to set on daffodil when creating the Daffodil Data Processor. There's a withValidation(...) call.

scholarsmate commented 1 year ago

@mbeckerle, a member of the development team is seeking the "why" to this issue, which I believe is a request that came from one of your 1.2.0 RC evaluations of the extension.

I believe the stream of validation errors would help the schema author by pointing out errors as they write the schema so that they can make corrections as they go, kind of like what an on-going code checker/lint would provide.

@mbeckerle, please correct or elaborate, if necessary.

mbeckerle commented 1 year ago

I believe you are correct here. The notion is when debugging, we might as well see validation errors as they are created which is what Daffodil's limited evaluation does. When parsing when it constructs each simple value node it verifies that the facet checks all work. I think it may also check number of recurrences for arrays are within max/min, at the end of an enclosing element.

arosien commented 1 year ago

@scholarsmate @mbeckerle I'm trying to think of how the validation errors should be displayed to the user. More specifically, how they should map into DAP events and types. We could just log the errors using the normal (Scala) logger. But an alternative would be to encode these as an Output event which would get rendered in a better way, presumably. What do you think?

mbeckerle commented 1 year ago

A viewer that gives somes sort of scrollable collapsable view of these log and diagnostic info would be helpful. I do think they should cross DAP over to the client and get displayed as they are created.

This stuff is way too verbose and poorly formatted currently, lots of very long path names in it, etc. There are JIRA tickets to improve this for ordinary API and command line usage, but in the IDE I do think it deserves a panel that, for example, collapses all errors of the same type so you don't see them repeating (and shows a count instead for example).

What I can't say is how important is this releative to other features needed.

arosien commented 1 year ago

@mbeckerle @scholarsmate any example schemas which produce diagnostics that I can use to test? I'm not sure if any of the ones I've been using produce anything.

mbeckerle commented 1 year ago

EDIFACT will generate many schema definition warnings unless you specify a config that suppresses them.

See the tdml file and look for the define-config which has explicit suppression of numerous warnings. Take those out and numerous warnings should be issued.

Note: you want to checkout the daffodil-dev branch, not master.

mbeckerle commented 1 year ago

EDIFACT is on github. Search for DFDLSchemas EDIFACT to find.

mbeckerle commented 1 year ago

Ah, those will be schema-comilation warnings. If you are looking for data validation errors,... I will need to find a schema that produces those.

Daffodil itself has tests that contain tdml:validationError to expect specific validation errors.

arosien commented 1 year ago

Ah, those will be schema-comilation warnings. If you are looking for data validation errors,... I will need to find a schema that produces those.

Daffodil itself has tests that contain tdml:validationError to expect specific validation errors.

I found https://github.com/apache/daffodil/blob/main/daffodil-japi/src/test/java/org/apache/daffodil/example/ValidatorApiExample.java#L76 so far and have verified that I'm capturing those validations. I'll look to see if there are more around in that repo, thanks!

2023-01-25 09:49:19,291 [io-compute-6] INFO  o.a.d.d.d.Parse - Validation Error: alwaysInvalid failed facet checks due to: facet maxExclusive (0)
Schema context: alwaysInvalid Location line 34 column 10 in file:/Users/arosien/nteligen/daffodil/daffodil-japi/src/test/resources/test/japi/alwaysInvalid.dfdl.xsd
Data location was preceding byte 1 

Just to be clear, this ticket is about the data validation, not schemas themselves?

arosien commented 1 year ago

Also, do the data validation diagnostics accumulate throughout the parse, or does it act like a stack? I'm wondering whether to emit them only at the end (in the case of accumulation), or when they are encountered.

mbeckerle commented 1 year ago

They get saved in the parse state.

Lots of validation errors might be encountered during speculative parsing that backtracking needs to discard.

The parser does have streaming behavior though, so when it is not speculating, it should both issue infoset output and associated validation errors on the fly. I did not verify that this in fact happens. @stevedlawrence can you comment on this?

The challenge is that this happens not when validating the element, but later when it is acertained that the element is not going to get backtracked.

stevedlawrence commented 1 year ago

The parser does have streaming behavior though, so when it is not speculating, it should both issue infoset output and associated validation errors on the fly.

Validation errors are accumulaed in a diagnostics data structure and removed when backtracking. Those Diagnostic live on the PState. We do not stream out validation diagnostics like we do the infoset, so for normal users they don't see them until the very end of a parse.

But the debugger doesn't really want to using the "streaming" infoset stuff anyways, since we only "stream" things out when we know they won't be backtracked. The debugger will want to show the current infoset at the current breakpoint regardless if backtracking might happen, and just hide parts of the infoset if backtracking does occurs. I assume the DAP stuff already does this by reading the infoset directly from the PState? The validation messages are stored in the PState in the diagnostics member, so can be accessed in a similar way, and so can be done during parsing instead of at the end. Note that this same member stores things like SDE's and processing errors, so it's not just validation errors.

Also, there is a function on each infoset node that says if the element is valid or not, but there is not related diagnostic there.

Note that this is only with the "limited" validation feature. Full validation uses xerces and just validates everything at the end of a parse. We might someday figure out how to have Xerces validate as we parse (but similar to the streaming stuff, it won't see infoset's that might be backtracked).

arosien commented 1 year ago

How does something like this look?

Screen Shot 2023-01-26 at 9 12 46 AM
mbeckerle commented 1 year ago

The text in the block looks good, though I'd like to find some way to trim a prefix of that big long file path based on say, the current project's root dir?

If that block is a pop up that's the wrong idea.

It should be a thing like the call-stack on the left, where you can open it and see the contents as things get added. It should have two sub-categories within it for processing errors and for validation errors.

Think of it this way. In some cases, every parsed element will generate one of these validation errors, and the user may simply want to ignore that for hours/days/weeks while worrying about bigger problems like parse errors. So they can't be anything that makes too much noise or requires dismissal via any UI gesture.

arosien commented 1 year ago

The text in the block looks good, though I'd like to find some way to trim a prefix of that big long file path based on say, the current project's root dir?

I can extract the particular fields, and possibly massage them, from the diagnostic data structure rather than this impl which uses toString.

If that block is a pop up that's the wrong idea.

It's the hover text. 👍

It should be a thing like the call-stack on the left, where you can open it and see the contents as things get added. It should have two sub-categories within it for processing errors and for validation errors.

Think of it this way. In some cases, every parsed element will generate one of these validation errors, and the user may simply want to ignore that for hours/days/weeks while worrying about bigger problems like parse errors. So they can't be anything that makes too much noise or requires dismissal via any UI gesture.

It's in the "Parse" subtree under the "Variables" pane, so it's expandable/contractable. I'll separate the errors from warnings as separate trees.

stevedlawrence commented 1 year ago

The text in the block looks good, though I'd like to find some way to trim a prefix of that big long file path based on say, the current project's root dir?

I can extract the particular fields, and possibly massage them, from the diagnostic data structure rather than this impl which uses toString.

This might be fixed when DAFFODIL-2100 is resolved, which is currently planned to be fixed in the next release (though not fixed yet). It would be nice if VSCode didn't have to worry about this stuff and could just rely on Daffodil to do the right thing.

arosien commented 1 year ago

This might be fixed when DAFFODIL-2100 is resolved, which is currently planned to be fixed in the next release (though not fixed yet). It would be nice if VSCode didn't have to worry about this stuff and could just rely on Daffodil to do the right thing.

I'll add a TODO comment about the issue and hopefully any future deprecated methods will alert us of any useful changes to make in the extension.

arosien commented 1 year ago

Ok here's an updated view, hover text visible:

Screen Shot 2023-01-26 at 3 31 56 PM