ebi-ait / checklist

Template repository for checklists
Apache License 2.0
1 stars 0 forks source link

add json validation to ena flow #34

Closed amnonkhen closed 3 months ago

amnonkhen commented 3 months ago

Discussing with member of the ENA team (Rasko) about where in the ENA flow the XML documents are validated against the checklist. Keep that piece of code and add to it the part that will not interfere with the ENA work to send it to validation with JSON.

For now, XML validation is not being replaced, adding JSON validation at the same timepoint. Run both in parallel and have some comparisons to see if they give similar results.

Acceptance Criteria

snathanvj commented 3 months ago

This is the flow as I understand, SampleParser is the place we need to add our json schema validation.

sequenceDiagram
    client->>WEBIN_REST:submit sample json
    WEBIN_REST->>Submit_Controller:entry
    Submit_Controller->>WebinSubmissionService: submit jsons
    WebinSubmissionService->>WebinSubmissionService: convert json into XML object
    WebinSubmissionService->>SRA_Loader: validate and create submission
    SRA_Loader ->> SRA_Loader: if no parse errors, validate sample(sequencetools checks)
    SRA_Loader ->> SampleParser: further schema validations
    SampleParser ->> SampleParser: get checklist ID to get checklist for schema validation
    SampleParser ->> SampleParser: if no parse error or sample validation error,validate again checklist xml schema
    note right of SampleParser: THIS IS WHERE WE NEED TO PLUG-IN OUR VALIDATION
    SampleParser ->> SRA_Loader: validation result with submission objects
    SRA_Loader ->> SRA_Loader: if all valid load(save submission)
    SRA_Loader ->>WebinSubmissionService : receipt
    WebinSubmissionService ->>Submit_Controller : receipt
    Submit_Controller ->>WEBIN_REST : receipt
    WEBIN_REST-->>client:receipt
snathanvj commented 3 months ago

Repository: Webin_REST : https://github.com/enasequence/sub-sra

Files changes required: WebinSubmissionService.java (submit method lines between 83 to 97 converts xml to json ), SRALoader.java (validate method, probably line 300 to 310 have to be duplicated ), SRASampleParser.java (line 151 validateAndFind(...))

Testing: Unit test: SubmitSRALoaderTest.java Integration test: test dev environment , endpoints are here https://github.com/enasequence/sub-sra/blob/master/README.md#deployment

Deployment: Deployment is straight forward , done via gitlab pipeline, informations are there in readme https://github.com/enasequence/sub-sra/blob/master/README.md#deployment

snathanvj commented 3 months ago

@amnonkhen I need more clarity on this task 1) Where should we need to put the validation result for our json schema flow? 2) Are we making the changes in Webin-REST , a) so that in future we just can use the same code for json based validation and cut-off the xml validation part easily(this is what I thought) b)we just need to validate the sample against json based checklist schema, in this case we just can run it as an independent pipeline(probably using pipelite which will save the logs in pipelite_log table)

if 1.a is the answer then shall I go ahead and make changes and create a PR.

amnonkhen commented 3 months ago
theisuru commented 3 months ago

After the call with Dipayan and Senthil, it is quite clear how we want to handle this. We have identified where the JSON validation code could fit in with ENA flow. We can inject this where XML validation is happening right now before we submit to BioSamples. (Senthil probably can add more details about this)

ENA is using GCP Cloud Logging and monitoring. Once both JSON and XML validation is finished we will send related metrics via micrometer and further send logs to Cloud Logging if there is any discrepancy between the two results.

snathanvj commented 3 months ago

@amnonkhen @theisuru

https://github.com/enasequence/sub-sra/tree/ck-34

this branch has initial code changes. It is still incomplete.

Note: Webin-REST has two endpoints V1 and V2 , only V2 endpoint accepts json as input, I made changes in V2 flow.

amnonkhen commented 3 months ago

Thanks. Will review shortly. Please rename branch ck-nn so the GitHub autolinks work

amnonkhen commented 3 months ago

Following meeting with ENA team (Dipayan) we realised ENA will remove validation, and everything will be delegated to BioSamples, so we don't need to do anything in ENA.