Closed brainstorm closed 4 years ago
hi @brainstorm thanks for taking a look into this! I will try to reproduce the error, but the config seems correct from an initial look. The server is still under active development, that link should be removed as we plan to build a better documentation page. The index.html
page was just a stub.
@brainstorm the dataSourceRegistry
module is based on matching ids with regex patterns. Right now, the match will fail if there's no capture groups (ie. the system expects there to be capture groups to align a passed id with one file in a directory of similarly named files, for example). The API fails this ID because there's no capture group.
This should be corrected to allow for single, hardcoded files. For now, you should be able to fix it by doing this:
...
"dataSourceRegistry": {
"sources": [
{
"pattern": "^(?P<accession>NA12878)$",
"path": "../../htsget-refserver/data/gcp/gatk-test-data/wgs_bam/{accession}.bam"
}
]
}
...
ie the named capture group of accession
will only match NA12878
, which will then be injected into the file path.
Following this up from https://github.com/igvteam/igv/pull/850 to here, since it doesn't belong in the IGV-desktop PR....
Maps a single ID (NA12878) to a single, local file (located at ./data/gcp/gatk-test-data/wgs_bam/NA12878.bam). When you run the server, do you have this file available locally?
Yes I believe so:
% md5sum ./data/gcp/gatk-test-data/wgs_bam/NA12878.bam
bc8e0e64772c9039bb3f9d00c0b8fc4e ./data/gcp/gatk-test-data/wgs_bam/NA12878.bam
Given the config, the following IDs won't work:
giab.NA12878.NIST7035.1
,giab.NA12878.NIST7035.2
because they don't conform to the above regex pattern. To pull in these files to the htsget server, you would need to add more data sources. Check out this file, which is how I configure the server when testing locally.
I just pulled that config-local.json
file locally and I think that the problem is with the files that sit locally, see:
$ ./htsget-refserver -config data/config/config-local.json
Server started on port 3000!
$ curl -s http://localhost:3000/reads/NA12878 | jq .
{
"htsget": {
"error": "NotFound",
"message": "The requested resource could not be associated with a registered data source"
}
}
$ curl -s http://localhost:3000/reads/giab.NA12878.NIST7035.1 | jq .
{
"htsget": {
"format": "BAM",
"urls": [
{
"url": "https://giab.s3.amazonaws.com/data/NA12878/Garvan_NA12878_HG001_HiSeq_Exome/project.NIST_NIST7035_H7AP8ADXX_TAAGGCGA_1_NA12878.bwa.markDuplicates.bam",
"headers": {
"Range": "bytes=0-499999999"
}
},
(... continues with subsequent urls and ranges...)
Yes, with config-local.json
, the giab.NA12878.NIST{accession}.{lane}
ids will work.
If you are using config-local.json
you will need to prepend gatk.
to the beginning of the ID to access that local file at ./data/gcp/gatk-test-data/wgs_bam/NA12878.bam
. This is the matching data source in the config:
{
"pattern": "^gatk\\.(?P<accession>.*)$",
"path": "./data/gcp/gatk-test-data/wgs_bam/{accession}.bam"
}
Try the ID, gatk.NA12878
and/or gatk.NA12878_20k_b37
.
You can also hit this same server running on AWS at https://htsget.ga4gh.org
, ie. https://htsget.ga4gh.org/reads/gatk.NA12878
Thanks Jeremy!
$ curl -s http://localhost:3000/reads/gatk.NA12878 | jq .
{
"htsget": {
"format": "BAM",
"urls": [
{
"url": "http://localhost:3000/file-bytes",
"headers": {
"HtsgetFilePath": "./data/gcp/gatk-test-data/wgs_bam/NA12878.bam",
"Range": "bytes=0-15236349"
}
}
]
}
}
Hello @jb-adams, great refresh of this refserver impl, looking good! I've tried the integration tests that point towards tabulamuris against the CZI S3 bucket and they work great.
Now, when testing it out by pointing the
config
to the local BAM GiaB files like so:Could you outline some simple
curl
queries in the README, i.e (but working):I'm not sure it's a good moment to ask those questions since there seem to be some tests failing though, perhaps this server is still under active development?:
Also, docs referenced from the GA4GH production config are 404'ing.
Most surely it's just me not passing the correct/mandatory parameters or doing something wrong, so please let me know which sample
curl
queries I can fire up given the above.json
config file. Thanks in advance!/cc @victorskl @ohofmann @reisingerf