Open KyleGao opened 4 years ago
Thanks for the feedback and the link!
There are som tests for DEL
(see https://github.com/NBISweden/beacon-api-tests/blob/b0406a023369a97f7180f3585015187e0296b92d/tests/v101/test_counts.py#L198 and below), but you are right that there are none for DUP
. We will try to include it in a hopefully soon future!
O.k.; here some more issues/comments (the Beacon+ ones are "notes to self"...).
@MalinAhlberg @KyleGao @sdelatorrep
INFO: Testing version v101
INFO: *** Running tests from test_datasets
INFO: Testing test_two_datasets
Test that both datasets repsond.
INFO: Open https://beacon.progenetix.org/query?
referenceName=22
referenceBases=TG
assemblyId=GRCh38
start=16577043
end=16577045
includeDatasetResponses=HIT
variantType=SNP
There is a case to be made for supporting wildcard scenarios, e.g. by allowing a "SNP" query against a position or range, w/o any specification of referenceBases
or alternateBases
.
INFO: Testing no_refbases
Check that queries without referenceBases is not allowed.
INFO: Open https://beacon.progenetix.org/query?
referenceName=22
alternateBases=N
assemblyId=GRCh38
start=0
end=2
includeDatasetResponses=HIT
datasetIds=GRCh38%3Abeacon_test%3A2030-01-01
This is correct; but the minimum use of a single "N" for structural or wildcard
queries as per spec is ambiguous, since the query with "referenceBases=N"can be
interpreted as requiring "any referenceBases
value of length 1", and would not
match e.g. "referenceBases=CG".
INFO: Testing test_snp
Test variantType SNP.
INFO: Open https://beacon.progenetix.org/query?
referenceName=22
referenceBases=C
assemblyId=GRCh38
start=17302971
end=17302972
includeDatasetResponses=HIT
datasetIds=GRCh38%3Abeacon_test%3A2030-01-01
variantType=SNP
INFO: Testing test_bad_end
Test querying with a bad end position.
INFO: Open https://beacon.progenetix.org/query?
referenceName=22
referenceBases=A
alternateBases=G
assemblyId=GRCh38
start=17300407
end=17300409
includeDatasetResponses=HIT
datasetIds=GRCh38%3Abeacon_test%3A2030-01-01
and
INFO: Testing test_end
Test the same query as `test_bad_end()` but with the correct end position.
...
end
parameter should be ignored when referenceBases
and
alternateBases
exist. This seems appropriate - otherwise one has to check for
a calculated end
position.end
use to range matches and precise SVs.INFO: Testing test_insertion
Test variantTypes INS.
INFO: Open https://beacon.progenetix.org/query?
referenceName=22
referenceBases=A
assemblyId=GRCh38
start=16064512
end=16064513
includeDatasetResponses=HIT
datasetIds=GRCh38%3Abeacon_test%3A2030-01-01
variantType=INS
This may be a correct use, but is not really documented in the spec. This would be considered a wildcard query, not a structural one, at a precise position.
INFO: Testing test_deletion
Test variantTypes DEL.
INFO: Open https://beacon.progenetix.org/query?
referenceName=22
referenceBases=GACAA
assemblyId=GRCh38
startMin=16517679
startMax=16517680
endMin=16517684
endMax=16517684
includeDatasetResponses=HIT
datasetIds=GRCh38%3Abeacon_test%3A2030-01-01
variantType=DEL
and
INFO: Testing test_deletion_2
Test variantTypes DEL with startMin/startMax.
...
referenceBases
and structural DEL
. This is IMO incorrect/misleading; structural variants (DEL
) are frequently imprecise & therefor do not have specific referenceBases
.DEL
as operational parameter, matching also "INDEL" ... variants (which is difficult to delineate in the specification, perhaps).INFO: Testing test_snp_mnp
Test representation of TG->AG and multiple variations from one vcf line.
INFO: Open https://beacon.progenetix.org/query?
referenceName=22
referenceBases=TG
assemblyId=GRCh38
start=16577043
end=16577045
includeDatasetResponses=HIT
datasetIds=GRCh38%3Abeacon_test%3A2030-01-01
variantType=SNP
As above, "SNP" use for wildcard searches? This is not documented (i.e. no required use of explicit variant type "SNP").
(streamlined/clarified some comments in edit 2019-10-29)
Thanks for these comments, @mbaudis!
A lot of the comments are about how to interpret the variantType field in the spec. Of relevance here is that in the vcf file we use there currently aren't any symbolic alternate alleles.
We still think that the variantType is obvious in many cases and therefore we have tested that the beacon can respond to those cases. As we think researchers would be surprised if they didn't get responses otherwise. But maybe this type of translation is the job of a frontend tool to convert a more freeform query to a beacon-api query.
We do find it a little bit confusing to use two different fields (alternateBases
and variantType
) in the API that map to the same field in the VCF file (ALT
) in such a way that only one of them is allowed to be present. Especially since the VCF specification itself mentions different variant types in section 5.2 ("Decoding VCF entries for SNPS and small indels"). But if this is what the specification means the tester should comply with that.
In your first example, do you mean that the beacon should return a 400 bad request response?
As for the usage of the end
parameter. We did not interpret the specification in such a way that end
is disallowed when both start
and referenceBases
is used, but maybe this also should return a 400 bad request? Or should it just ignore the end
parameter?
And just to make sure that we are on the same page with regards to terminology. When you say "structural query" do you then mean those cases that uses a symbolic alt in the vcf?
The current test data only includes breakpoint rearrangements, the DUP and DEL cases are not included. We would like to also have these test cases for our copy number beacon.
Copy number variants are imprecise DUP/DEL of a large span (usually kbs and mbs). A good example of DUP/DEL in VCF can found on page 11 in the VCF specification (https://samtools.github.io/hts-specs/VCFv4.2.pdf).