ga4gh / compliance

Compliance test suite for the APIs defined in ga4gh-schemas. RETIRED 2018-01-24
https://ga4gh.org
Apache License 2.0
10 stars 23 forks source link

Test readsSearch with multiple readGroups #185

Closed dcolligan closed 8 years ago

dcolligan commented 8 years ago

Issue #184

macieksmuga commented 8 years ago

-1 This test does not encompass the actual requirement as stated in the schemas (that queries across multiple read groups, whether or not they belong to the same read group set, succeed). That full requirement is already tested via searchReadsWithAllIdsReturnsReadsForEach.

That being said, I do like the actual contents of the test: That each returned alignment have its readGroupId contained in the set of readGroupIds queried. So on that strength alone, this test may be a valuable addition to the suite.

However, we've been trying to get away from overfitting compliance tests to the specific behaviors of the reference server. This test is specifically crafted to the known behavior of the reference server, and as such belongs in the reference server's suite of tests, not here.

diekhans commented 8 years ago

I don't understand the objection. This adds a legitimate test cases for any server. It does not remove any tests to make the reference server happy.

As long as there is a test for readgroups across readgroupsets and the reference server is failing it, this isn't over-fitting the current restrictions.

Maciek Smuga-Otto notifications@github.com writes:

-1 This test does not encompass the actual requirement as stated in the schemas (that queries across multiple read groups, whether or not they belong to the same read group set, succeed). That full requirement is already tested via searchReadsWithAllIdsReturnsReadsForEach.

That being said, I do like the actual contents of the test: That each returned alignment have its readGroupId contained in the set of readGroupIds queried. So on that strength alone, this test may be a valuable addition to the suite.

However, we've been trying to get away from overfitting compliance tests to the specific behaviors of the reference server. This test is specifically crafted to the known behavior of the reference server, and as such belongs in the reference server's suite of tests, not here.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub*

david4096 commented 8 years ago

@macieksmuga understandable, fortunately we still have the other test in place and are failing appropriately. I asked for this test to be added since it covers what I believe to be a very common use case: "all the reads in a BAM". We won't forget that we're still under-implementing.

@dcolligan you could consider using list comprehension as opposed to a for-loop with something like this:

        final List<String> readGroupIds = readGroups
                .stream()
                .map(a -> a.getId())
                .collect(Collectors.toList());

I would like it if the comment stated this is for the "all the reads in a readgroupset" use case. "multiple read groups" is confusing and suggests that it might be testing some other case, like a subset of readgroups from a readgroupset. "Verifies that a request for all the readGroupIds in a Read Group Set returns only alignments with those readgroupIds." You might also improve the test by verifying some other fields of the alignment are well formed. +1

macieksmuga commented 8 years ago

After talking it over with folks, +1 once @david4096's suggestions are incorporated (list comprehension and revised documentation).

@diekhans it would also be a good idea to include the other common use case that Richard Durbin pointed out in a call (not currently supported by the reference server) for balance: The request for reads in a small genomic coordinate region (say, immediately surrounding a single known SNP), across ALL read groups.

dcolligan commented 8 years ago

@david4096 updated

david4096 commented 8 years ago

thanks @dcolligan !