TranslatorSRI / SRI_testing

MIT License
0 stars 1 forks source link

Restructure KP (and ARA?) test data/configuration to facilitate test data reuseability? #59

Open RichardBruskiewich opened 2 years ago

RichardBruskiewich commented 2 years ago

The SRI Testing KP and ARA JSON test configuration ('test data') files are currently a single file containing the target endpoint; identification and test exclusion metadata; (for KPs) the list of test edges, and (for ARA's) the list of KP infores id's.

Unfortunately, this discourages test data sets DRY reusability, in that multiple deployment/CI environments, described in independent Translator SmartAPI Registry entries, need to duplicate the test (edge data or KPs) configuration file, one per environment, even though the only thing that changes is the TRAPI endpoint URL. How can this duplication be avoided?

See also issue #61

Option A

Split the test configuration into two parts (levels), namely:

Translator SmartAPI Registry entry (e.g. production, dev, etc.) x-trapi.test_data_location => _KPdev.json (has all metadata except test edges) => _Testdata.json (only the test data edges)

Similar two-level structure of test configuration for ARA's as well?

Option B

Translator SmartAPI Registry entry (e.g. production, dev, etc.) x-trapi.test_data_location => KP_test_data.json file replaces a single test data endpoint with a JSON object dictionary of endpoints, one endpoint per (x-maturity?) environment (dev, prod, etc)

Option C

Don't embed the test data endpoint inside the test data file at all, since we have that information in the Translator SmartAPI Registry already! I guess the rest of the metadata - e.g. infores, etc. - can remain where it is?

YaphetKG commented 2 years ago

is it fair to assume that a component (specifically a KP) might have different datasets at a point across the different environments. As an example we have graphs that are being built for plater, and before these new datasets are pushed all the way to ITRB prod, we do some integration test and some validation on lower environments like the development environment.

During this time, i think we can assume that the test data is probably going to be different for a single infores across , since the backend data is also different.

I feel like Option B seems like a valid approach if we are anticipating such a scenario. I.e test data seems to be something to couple with the dataset that is available in a particular environment.

RichardBruskiewich commented 2 years ago

Thanks @YaphetKG, please also see issue #61 (inspired by a chat with Chris B...)

RichardBruskiewich commented 2 years ago

from Evan Morris: Yes, I definitely think the ability to have different ones for each deployment is needed. They will share them often but not always. Perhaps we could have a list of test data locations, either corresponding to the order of the servers or with server:testdata pairs

EvanDietzMorris commented 2 years ago

I agree that option B seems like a good approach. ie a dictionary with endpoint->test_data_location entries for each deployment. We could also remove the endpoint URL from the test files with that approach, if reusing test data files would be useful for anyone.

YaphetKG commented 2 years ago

Thinking about this more, i think we need some way mapping each test data to a server with in a single openapi spec. For eg if we take a look at (https://trapi-openapi.apps.renci.org/trapi/infores%3Aautomat-ctd/1.3.0) , there are three servers in the server's array . And we are going to have one openapi spec per trapi*infores .

I think it would be nice if the test data attribute in the openapi-spec was a sibling of servers[*].x-maturity. Instead of a top level attribute (?)

cc: @cbizon , @RichardBruskiewich , @EvanDietzMorris

cbizon commented 2 years ago

I agree with @YaphetKG if it's per-server (which it should be) then the server block is the right place to put it.

EvanDietzMorris commented 2 years ago

I agree, looking at the openapi spec again, that makes the most sense. It would also be a sibling of the endpoint URL which would allow testing to just grab the server blocks and iterate through them.

RichardBruskiewich commented 2 years ago

I think we are all on the right path here. Hi all, I've alerted Chunlei to ponder the requirement in the context of the TRAPI spec. I'm unsure where to post a PR to propose the change. This seems slightly outside the scope of the current Translator expenses (x-translator and x-trapi) since the server block exists outside of these info blocks.

newgene commented 2 years ago

Adding my two comments here:

  1. an object or an array of objects
    
    # same test_data for all servers
    ...
    test_data_location:
         url: <test_data_url>

different test_data for servers

... test_data_location:

  1. a nested object with a "default" key:
    # add production key only if you have a field overrides the default value
    ...
    test_data_location:
       default:
             url:
       production:
             url:

    Note that you can add any additional field you might need later (e.g. those from issue #61).

Both ways allow us to define a default set of parameters (No. 1 is explicit if server is not specified, No. 2 is more explicit under default field), and then specify those fields different from the default set for certain server (without repeating the same fields).

YaphetKG commented 2 years ago

Personally I like 2 of @newgene 's suggestion, it seems like it would be useful to have a default. for me i don't have strong opinions about putting this meta data in info versus servers section. But if we think servers is being overloaded like @newgene pointed out, maybe we could consider option 2 (?)

RichardBruskiewich commented 2 years ago

Thanks @newgene, for your feedback,

Quick question: are the x-location and x-maturity properties in the servers block entries, properties that we specified? If so, a third option here is to add another x-test-config property (serving the function of the info.x-trapi.test_data_location except assigned to specific x-maturity environments).

That said, I do see your point about localizing the test information to the semantic context of the testing, which is either x-trapi or x-translator (actually, both are sort of in scope here since the SRI Testing harness now spans both Biolink Model compliance and TRAPI compliance... one might almost conceivably wish to see the x-trapi property contents subsumed under the x-translator property?)

Just for the record, the SRI Testing harness is going to access the servers block to get it's URL's for testing. The tricky part here is that there is no consistency in the Registry as to what environment is being tested: one sees 'testing', 'staging', 'development' and 'production' all represented in the current Registry resources that now provide an info.x-trapi.test_data_location property! Fortunately, most Registry entries only have one servers block entry at the moment, irrespective of the value of the x-maturity tag.

@cbizon and @putmantime, would you like to vote on the above (or have @newgene present the options at some appropriate upcoming meeting, for a vote?)

newgene commented 2 years ago

are the x-location and x-maturity properties in the servers block entries, properties that we specified?

We actually don't use x-location in Translator, just x-maturity

test_data_location under x-trapi or x-translator?

I have no strong opinion of either way, assume we don't have any strong motivation to change it right now.

x-maturity consistency in Registry

I believe SmartAPI will report a validation error if its value is a non-standard term. We probably just need to push people to fix it. There is a warning badge if metadata has errors. For test harness framework, I think you can ignore those non-standard terms, maybe report it on a dashboard, as a motivation for people to fix ;-)

RichardBruskiewich commented 2 years ago

x-maturity consistency in Registry

I believe SmartAPI will report a validation error if its value is a non-standard term. We probably just need to push people to fix it. There is a warning badge if metadata has errors. For test harness framework, I think you can ignore those non-standard terms, maybe report it on a dashboard, as a motivation for people to fix ;-)

Just to clarify, all the values provided are 'correct' against the x-maturity list. What I'm observing so far among all the Translator registry entries which have set the info.x-trapi.test_location_data, is that their server block only has a single server definition, but that there is no one consistent x-maturity defined value against which the testing is meant to be done: I see one simply sees at least one 'testing', 'staging', 'development' and 'production' all represented, in one entry or another.

newgene commented 2 years ago

@RichardBruskiewich I see, so it's more of the metadata completion issue across Translator teams then. Maybe a dashboard to see the summary of completeness and what's missing would be a good motivation for people to complete their metadata?

RichardBruskiewich commented 2 years ago

@newgene (cc: @putmantime @cbizon), I suspect that @YaphetKG is right in his support of option 2.

Can the legacy info.x-trapi.test_data_location string (URL) value be allowed but we also now allow the expanded option 2 version for KPs and ARA's who need to be more specific?

In pseudo OpenAPI schema (I know it's not correct - the oneOf's ought to be $ref's, right?) but it is just a hint of what is needed):

     test_data_location:
        oneOf:
            - type: string
            - type: object
               properties:
                     default:
                         type: string
                      production:
                          type: string
                      etc..
               required:
                    - default

I would also expect the implied constraint that any x-maturity value under the test_data_location should also be paired with a TRAPI endpoint URL published in the servers block. If only 'default' is given, then any and all servers in the server block would have the same test data.

@newgene, what do we need to do in practice here to apply this new property schema to Translator? Perhaps we can start rolling it out as early as next week if the SmartAPI validation allows it, and some folks curate it, and I modify the SRI Testing to accommodate it (either a string or an object definition, parsed, for the URL endpoint)

newgene commented 2 years ago

@RichardBruskiewich yes, that would be no problem to allow either a string for simple case or an object for server-level customized settings.

Given Andrew and Colleen's feedback, I also have no objection to allow the actual test_data URL field being either a string (single URL) or an array of string (a list of URLs).

RichardBruskiewich commented 2 years ago

Given Andrew and Colleen's feedback, I also have no objection to allow the actual test_data URL field being either a string (single URL) or an array of string (a list of URLs) @newgene, how would one tag the x-maturity of a URL in a simple list of URLs? I sense that your Option 2 is more explicit, less prone to error and misinterpretation? Andrew indicated that he's ready to follow your (our?) lead on this so perhaps you can simply do the needful on your side (in the SmartAPI schema) then we can announce the change next week (at the Architecture meeting) as part of Tim's 'SRI Testing' update?

colleenXu commented 2 years ago

I've adjusted Service Provider's registration. This is an example of providing multiple test-data files for 1 server url in a registration.

https://github.com/NCATS-Tangerine/translator-api-registry/commit/c0496629510d8ebe4f607b380cfd2ff12c90b338

    test_data_location: 
      production:
        url:
        - "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/master/biothings_explorer/sri-test-service-provider.json"
        - "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/master/biothings_explorer/sri-test-multiomics.json"
        - "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/master/biothings_explorer/sri-test-text-mining.json"
newgene commented 2 years ago

Updated validation schema for test_data_location in SmartAPI registry in this PR. We can now register and test the updated test_data_location data structure in SmartAPI. The old data structure as a string value URL is still valid.

RichardBruskiewich commented 2 years ago

@colleenXu, @newgene , (cc: @cbizon, @putmantime), I have responded on Slack to some of the above issues. The multi-file test data use case is ambiguous within the current context of Translator which recognizes ARAs calling KPs, not KPs calling KPs.

Perhaps if you designate the 1 server url above (is that BTE?) as an ARA-style test file configuration, we can make some progress on your use case.

What that entails is:

  1. ensuring that all 'called' KPs have SmartAPI records with InfoRes identifiers and (ideally), their matched info.x-trapi.test_data_location property settings; and
  2. have the server with the single (production?) URL point to an ARA test configuration file that enumerates all the KP's it 'calls'.

Also, it is assumed by current (still somewhat 'original') SRI Testing design that all KP's can be directly called by TRAPI. This perhaps won't be the case for some (most?) of the BTE wrapped KPs, correct?

The SRI Testing harness is not designed to call non-TRAPI services. I guess, though, we can add a bit of code to have it skip direct TRAPI calls of non-TRAPI resources (should be easy to do if only TRAPI SmartAPI entries have the 'trapi' tag, right?). However, at the same time, listing the infores identifiers of those 3rd party non-TRAPI KP's and designating test data for them, could still allow them to be queries by the 1 server URL (ARA-like) resource above, which does have a TRAPI query interface.

There is one known subtlety that the SRI Testing harness doesn't currently code for the constraint that the ARA TRAPI call that uses test data of a given KP, ought to only ask that particular KP for its data (and not all of the rest of its KPs). The original design (coded by others, not the current development team) doesn't (yet) design for that TRAPI query subtlety.

I can imagine that if we somehow code the SRI Testing TRAPI calls to accommodate this subtlety and refine your above design of the multiple info.x-trapi.test_data_location configuration of multiple URL's - by instead providing an array of YAML object with KP infores and their urls, something like the example below, then your use case could be implemented in the manner you appear to be considering.

    test_data_location: 
      production:
        url:
        - infores: service-provider
          url: "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/master/biothings_explorer/sri-test-service-provider.json"
        - infores: multiomics-provider
          url: "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/master/biothings_explorer/sri-test-multiomics.json"
        - infores: text-mining-provider
          url: "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/master/biothings_explorer/sri-test-text-mining.json"
RichardBruskiewich commented 1 year ago

@colleenXu, @newgene , (cc: @cbizon, @putmantime), I have responded on Slack to some of the above issues. The multi-file test data use case is ambiguous within the current context of Translator which recognizes ARAs calling KPs, not KPs calling KPs.

Perhaps if you designate the 1 server url above (is that BTE?) as an ARA-style test file configuration, we can make some progress on your use case.

What that entails is:

  1. ensuring that all 'called' KPs have SmartAPI records with InfoRes identifiers and (ideally), their matched info.x-trapi.test_data_location property settings; and
  2. have the server with the single (production?) URL point to an ARA test configuration file that enumerates all the KP's it 'calls'.

Also, it is assumed by current (still somewhat 'original') SRI Testing design that all KP's can be directly called by TRAPI. This perhaps won't be the case for some (most?) of the BTE wrapped KPs, correct?

The SRI Testing harness is not designed to call non-TRAPI services. I guess, though, we can add a bit of code to have it skip direct TRAPI calls of non-TRAPI resources (should be easy to do if only TRAPI SmartAPI entries have the 'trapi' tag, right?). However, at the same time, listing the infores identifiers of those 3rd party non-TRAPI KP's and designating test data for them, could still allow them to be queries by the 1 server URL (ARA-like) resource above, which does have a TRAPI query interface.

There is one known subtlety that the SRI Testing harness doesn't currently code for the constraint that the ARA TRAPI call that uses test data of a given KP, ought to only ask that particular KP for its data (and not all of the rest of its KPs). The original design (coded by others, not the current development team) doesn't (yet) design for that TRAPI query subtlety.

I can imagine that if we somehow code the SRI Testing TRAPI calls to accommodate this subtlety and refine your above design of the multiple info.x-trapi.test_data_location configuration of multiple URL's - by instead providing an array of YAML object with KP infores and their urls, something like the example below, then your use case could be implemented in the manner you appear to be considering.

    test_data_location: 
      production:
        url:
        - infores: service-provider
          url: "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/master/biothings_explorer/sri-test-service-provider.json"
        - infores: multiomics-provider
          url: "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/master/biothings_explorer/sri-test-multiomics.json"
        - infores: text-mining-provider
          url: "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/master/biothings_explorer/sri-test-text-mining.json"
RichardBruskiewich commented 1 year ago

I've adjusted Service Provider's registration. This is an example of providing multiple test-data files for 1 server url in a registration.

NCATS-Tangerine/translator-api-registry@c049662

    test_data_location: 
      production:
        url:
        - "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/master/biothings_explorer/sri-test-service-provider.json"
        - "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/master/biothings_explorer/sri-test-multiomics.json"
        - "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/master/biothings_explorer/sri-test-text-mining.json"

@newgene, alas, the meaning of the above array of url strings, from an SRI Testing harness perspective, is completely undefined. What is the testing harness behaviour supposed to be here?

Also, should there also be examples for https://github.com/NCATSTranslator/translator_extensions/pull/19/files

newgene commented 1 year ago

@RichardBruskiewich I would expect the SRI Testing just run the test against each of these provided test_data files. Ideally, also report the errors and the corresponding test_data files. Hope this is not too much efforts on your testing backend.