ga4gh-beacon / beacon-verifier

Tool to verify that a Beacon implementation follows the specification
2 stars 3 forks source link

verifier points to old draft #12

Open anuradhawick opened 2 years ago

anuradhawick commented 2 years ago

Hi,

It seems the beacon-verifier code base has the following defaults.

model: default_value = https://github.com/MrRobb/beacon-v2-Models/BEACON-V2-draft4-Model framework: default_value = https://github.com/MrRobb/beacon-framework-v2

I am referring to the latest schemas for model and framework, hence the verifier wasn't running correctly. Changing model to; default_value = https://github.com/ga4gh-beacon/beacon-v2-Models/BEACON-V2-Model fixed the issue. However, I cannot point to the latest framework as it breaks some JSON schema references.

If it helps, the exact error I get is as follows;

{
  "Info": [
    {
      "name": "CSIRO Serverless Beacon",
      "url": "https://ip35zfam3d.execute-api.us-east-1.amazonaws.com/prod/info",
      "valid": false,
      "error": "Response does not match the schema: failed to resolve json-schema:///sections/beaconInformationalResponseMeta.json: cannot resolve relative external schema without root schema ID ()\nfailed to resolve json-schema:///sections/beaconInfoResults.json: cannot resolve relative external schema without root schema ID ()\n"
    }
  ],
}

It would be great if this issue could be addressed or a proper workaround can be shown.

Thanks in advance.

cheers, Anuradha

mbaudis commented 2 years ago

@anuradhawick Thanks for the note! I think there is some cleanup to be done ... Can you give some feedback about the errors when using the new value? Should be

.. at least concerning the correct repositories. However, there have been internal changes, e.g. a general witch to relative references which apparently haven't vetted for verifier compatibility.

redmitry commented 2 years ago

Hello,

I executed the verifier and it passes the /info test.

It should be the json schema library bug, as it can't resolve the referred schema:
failed to resolve json-schema:///sections/beaconInformationalResponseMeta.json

Could it be because of the old RUST version (rustc --version) ? I have 1.59.0

The background: In the beaconInfoResponse.json:

"properties": {
  "meta": {
    "description": "Information about the response that could be relevant for the Beacon client in order to interpret the results.",
    "$ref": "./sections/beaconInformationalResponseMeta.json"
},

IMO the fail in the json schema validation library is in the chain of relative "$ref"s. The library should resolve the "$ref" against current document.

BEACON-V2-Model/endpoints.json uses the absolute reference:
"$ref":"https://raw.githubusercontent.com/ga4gh-beacon/beacon-framework-v2/main/responses/beaconInfoResponse.json" while beacon-framework-v2/blob/main/endpoints.json not:
"$ref":"../beacon-framework-v2/responses/beaconInfoResponse.json"

Kind regards,

Dmitry

mbaudis commented 2 years ago

@redmitry But can you try https://github.com/ga4gh-beacon/beacon-v2/tree/main/framework/json - which is the valid one now? There we went full-out relative refs. I've only checked this through materializing the schemas in Python; and there the only error I've encountered was when using a ref from models to the framework/common/ontologyTerm.json, where the the ontologyTerm then used itself a relative reference (to CURIE). This all looks like "the schemas are fine, the tools are ... not so much".

So this is a larger AI: Make sure & document that the current reference repository version checks out w/ various tools. https://github.com/ga4gh-beacon/beacon-v2 ... json for model and framework.

redmitry commented 2 years ago

Verifier doesn't report which version(s?) of model(s) it uses...
https://ip35zfam3d.execute-api.us-east-1.amazonaws.com/prod passes the default model (not all, but /info is valid).

I am unable to specify any --model link as it always fails :-(

I checked the https://ip35zfam3d.execute-api.us-east-1.amazonaws.com/prod/info against the beaconInfoResponse.json with my own schema library and it is valid.

D.

anuradhawick commented 2 years ago

@mbaudis I get the same error @redmitry gets. For some strange reason, the links you put do not work. We need to remove /tree/main part from the URL. Probably something to do with how strings are parsed.

@redmitry this URL is just a dev in progress and not all are implemented. I realised that verifier from cargo install (rustc version 1.62.0 (a8314ef7d 2022-06-27)) points to some old repositories, so I recompiled it with the new model path I mentioned in the original post. However, schema references are broken in the new framework repo (at least as I understand.).

The straightforward fix, I believe, would be to resolve schema errors and update the verifier repository to point to the latest framework and model URLs.

Just going overboard, but in compiled JSONs, fully dereferenced JSON schemas would be great. Makes it quite easy to read, without having to navigate many files (and better support some parsers that have relative path issues).

Anyway thanks heaps for getting into this so soon! Awesome job!

mbaudis commented 2 years ago

@anuradhawick @redmitry IMO a general problem is to have a complex schema with a gazillion of references pointing to its own parts on GH, like it was originally. AFAIK one should be able to pull the schemas into a local storage and use this... In Progenetix / bycon the local version (in the package) works well w/ the relative $ref values (same beacon-v2 directory structure). As part of instantiating data objects we de-reference the respective schemas (materialize). As said, the only problem ATM is the multiple-hops dereferencing from models -> framework/common/ontologyTerm -> ... CURIE which may be due to a double local de-referencing???

In general I hate to have a standard where such $ref to Github raw paths are sprinkled throughout.

So: Should't the option be to pull the schema repos and then verify against this localized version? Really asking; not a software engineer ¯\_(ツ)_/¯

CAVE: The variant schema uses now references to the VRS repo (as only external).

Suggestions?

mbaudis commented 2 years ago

@anuradhawick @redmitry IMO a general problem is to have a complex schema with a gazillion of references pointing to its own parts on GH, like it was originally. AFAIK one should be able to pull the schemas into a local storage and use this... In Progenetix / bycon the local version (in the package) works well w/ the relative $ref values (same beacon-v2 directory structure). As part of instantiating data objects we de-reference the respective schemas (materialize). As said, the only problem ATM is the multiple-hops dereferencing from models -> framework/common/ontologyTerm -> ... CURIE which may be due to a double local de-referencing???

In general I hate to have a standard where such $ref to Github raw paths are sprinkled throughout.

So: Should't the option be to pull the schema repos and then verify against this localized version? Really asking; not a software engineer ¯\_(ツ)_/¯

CAVE: The variant schema uses now references to the VRS repo (as only external).

Suggestions?

redmitry commented 2 years ago

In general I hate to have a standard where such $ref to Github raw paths are sprinkled throughout.

IMO standard shouldn't have links to the github. In this case, local copies would inevitably link to "other" piece of schema outside.

mbaudis commented 2 years ago

So we should have a verifier which pulls / rsyncs whatever schema root it is pointed at & verifies against this, but accept external $ref values in it, from a general POV, right?

anuradhawick commented 2 years ago

@mbaudis only issue is multiple hops, it creates confusion for the program, whether the relative path is from the program itself or the referencing JSON file or relative to the remote URL. This could make the different behaviour depending on the language (python vs rustc).

Having external URLs within schema makes it impossible to use within other programs directly. This is mainly due to security and we would typically block outbound connections from containers/clusters.

I like the idea of strictly local verification with zero HTTP calls to GitHub.