Synchronize Readme specs and test test json files

jnguyenx commented 9 years ago

I noticed two small inconsistencies:

In this test file there's a field called hgvs which is not defined in the specs. In my implementation it'll be ignored.
In both test files, the Feature has a label field whereas in the specs this field is not present but has ageOfOnset instead.
The patient wrapper is not present in the test files either.

buske commented 9 years ago

@jnguyenx Thanks for you comments! For 1. and 2., it's true, hgvs and label are not in the specification but it was intentional. We've been operating under the policy that occasionally, extra data may be useful to include in the API for interpretation or debugging, and sites that don't support it should ignore these fields (this is also how we get backwards-compatibility between minor versions). We should specify this more formally, and perhaps these should be removed from the test dataset anyway?

Regarding the patient wrapper, I'm unsure of whether this is helpful or not. The files serve two uses:

to provide example data that people should upload into their matchmaker
to provide example data for the query to make internally or externally to verify matchmaking Since only the later involves a patient wrapper, I left it off, but I'd be happy to people think this would be more consistent.

jnguyenx commented 9 years ago

Hi Orion, thanks for the quick answer.

Well I wanted to use these json files to test against my implementation, what I did is to use the Readme specs to do all my implementation, and then added unit tests which consume those files. I ran into a couple of errors due to the 3 points I mentioned above.

So in my opinion it would be very nice just to be able to consume those files as they are provided.

I think that it is fine to leave extra fields which are not going to be parsed, but I'm a bit more worried about 2., because ageOfOnset is a required field, therefore the test files are not compliant. Same for patient.

buske commented 9 years ago

Hi Jeremy,

Regarding patient: that's a fair point. I'm happy to update the test files if someone else +1s this.

Regarding ageOfOnset, it is an optional field. The only mandatory field for a phenotypic feature is id. This is described in the spec, but perhaps either:

It should be clarified, and/or
The summary specification at the top should only include mandatory fields?

Thoughts?

jnguyenx commented 9 years ago

Hi,

Oh yeah my bad, I didn't see the optional.

I think that ID in Feature should have the (mandatory) label as done for GenomicFeatures. That way it'll clearer.

The summary specification is fine like that for me, in combination with the detailed specs.

Thanks for your help!

cmungall commented 9 years ago

+1 to making the spec and test data consistent

buske commented 9 years ago

Since each request includes only a single patient, there's no easy way to make the 50 patient dataset spec-compliant, per se. I could make the test data a list of requests, (so a list of objects, each with a "patient" field and the corresponding details), but this is confusing as well, since it still isn't a valid request altogether. I think it makes the most sense to leave the dataset the way it is (perhaps removing informal hgvs and label fields), but provide an additional file with a sample request that completely conforms to spec, including the "patient" wrapper. Would this be a good compromise?

jnguyenx commented 9 years ago

Sounds good! And also rename the files so their content is implicit, like set of patients, set of queries, etc...

Thanks!

Relequestual commented 9 years ago

I don't think the test data file wasn't designed to be used for forming the test queries. Having said that, there's no reason we couldn't provide both an import file and individual files for individual requests. Going down that line of thought, once you split them into individual files, there's I can't see why we would mainatin both. For a one time data import script, I see little difference between opening one file vs 50. Open to discussion though.

I'm running acorss issues converting some of the hgvs codes to ref and alt alleles. I've ran into one variant specifically which is causing problems (NM_005559.3(LAMA1):c.2988_2989delA) as this doesn't appear to be a valid notation. I'm discussing this with Orion and François via email currently. Will report back here with any updates.

buske commented 9 years ago

Update: we tracked it down to a typo in the original manuscript and committed a fix. I'm going through and verifying all the other variants now.

ga4gh / mme-apis

Synchronize Readme specs and test test json files #105