Closed yash-puligundla closed 2 years ago
Answers to some of the questions below.
/search
on data connect server which provides the Drs uri for cram, crai, and bundle, what is the next step? Do we take these URI's and go to Drs server?
YesAll of the above are illustrated in many of the fast-scripts notebooks. A good example is in https://github.com/ga4gh/fasp-scripts/blob/master/notebooks/FASPNotebook18-GTEXExample-AWS.ipynb You would have to look at the clients in the fasp-scripts repo to see the underlying REST API calls.
Thank you @ianfore
@yash-puligundla you'll find that identifiers.org will not only resolve the identifier prefix, but also redirect you straight to the DRS /objects response. so you would take the drs://dg.4dfc:098e18d4-5ece-4bc6-9a79-68f5082da9bc example above and use identifiers.org as follows: https://identifiers.org/dg.4dfc:098e18d4-5ece-4bc6-9a79-68f5082da9bc
That behavior is down to the way that Michael, Binam and/or their colleagues registered the prefix with identifiers.org. The url_pattern is the thing that does it. The relevant details are shown on this page https://registry.identifiers.org/registry/dg.4dfc#! Other DRS servers would do the same.
@yash-puligundla you'll find that identifiers.org will not only resolve the identifier prefix, but also redirect you straight to the DRS /objects response. so you would take the drs://dg.4dfc:098e18d4-5ece-4bc6-9a79-68f5082da9bc example above and use identifiers.org as follows: https://identifiers.org/dg.4dfc:098e18d4-5ece-4bc6-9a79-68f5082da9bc
That behavior is down to the way that Michael, Binam and/or their colleagues registered the prefix with identifiers.org. The url_pattern is the thing that does it. The relevant details are shown on this page https://registry.identifiers.org/registry/dg.4dfc#! Other DRS servers would do the same.
This is good to know. Thank you, Ian. But, I doubt if this is the case with the starter kit. "https://identifiers.org/HG00284.1kgenomes.wgs.downsampled.bundle" is invalid, which makes me think it is not registered with identifiers.org.
I believe we can use the first option you listed above
a) for a host based URI, build the drs end point from the host name --- e.g. for drs://nci-crdc.datacommons.io/098e18d4-5ece-4bc6-9a79-68f5082da9bc --- the DRS endpoint would be https://nci-crdc.datacommons.io/ga4gh/drs/v1
Here is an example drs uri from data connect: "drs://localhost:5000/HG00284.1kgenomes.wgs.downsampled.bundle" In this example, I am not sure how I would get the drs object id from this uri
Is the drs object id = "HG00284.1kgenomes.wgs.downsampled.bundle" or is there a UUID that I need to obtain from somewhere? (I think this might be very specific to the starter kit implementation)
Just posted this as a separate issue. Then I saw your last paragraph. The new issue essentially addresses what you ask. I wanted to separate the id issue from the resolution issue.
Back to resolution. No, identifiers.org knows nothing about your local host.
Prefix based DRS ids are a good thing though,
Watch this space!
The prefix resolver that I understood could be run locally is Bioregistry. First off though, note that it can be used in the same way as identifiers.org. https:/bioregistry.io/dg.4dfc:098e18d4-5ece-4bc6-9a79-68f5082da9bc Same DRS id and prefix as above - different metaresolver.
Running Bioregistry locally See https://github.com/ga4gh/ismb-2022-ga4gh-tutorial/tree/main/supporting/bioregistry
Hi @yash-puligundla,
- After requesting /search on data connect server which provides the Drs uri for cram, crai, and bundle, what is the next step? Do we take these URI's and go to Drs server?
Yes, you can take these URIs and request the DRS Object from DRS. You will get an unauthorized error because you don't have a passport yet, but this is good to show there is access control
- How is the Drs URI resolved to an accessible s3 URL? Is there an endpoint that does this or are we manually doing the resolution?
The DRS spec mandates that the DRS URL be resolvable to an HTTP(S) URL via a simple pattern: drs://{host}/{id}
-> http(s)://{host}/ga4gh/drs/v1/objects/{id}
, so you could show that part of the spec and then have the class manually convert the DRS URL to an HTTP URL to get the DRS object.
- What is the Drs endpoint that gives the information about passport brokers and required visas?
There are multiple endpoints I'll outline:
GET /objects/{id}
- this is the most basic endpoint you're familiar withPOST /objects/{id}
- get a single DRS object via POST request, allows you to bring a passport for authPOST /objects
- bulk object request, use a selection
in the request body to get multiple DRS objectsOPTIONS /objects/{id}
- get the auth info for a single DRS objectOPTIONS /objects
- get the auth info for a selection
of DRS objectsSo you can see that there is an OPTIONS
request analog to both single and bulk DRS Object requests. You can review the controller for more info on expected payload/response.
- How can I remove the default test visas (StarterKitDatasetsControlledAccessGrants, DatasetAlpha, DatasetBeta, DatasetGamma) available in the Passport broker?
To do this, you'll have to create a new sqlite database rather than the default database that's bundled in the docker image, because that one has the test dataset. For reference, look at what I did for DRS in session 4. There's a resources/drs/db
folder that, when drs-migrate
and drs-dataset
are run (in the docker-compose), as db gets built there. That db gets mounted into the drs server at /db
. The config file at resources/drs/config/config.yml
indicates the database URL as the file at /db/drs.db
(relative to the docker container, not the host machine).
So you want to do something similar for passport. Use the SQL script that creates the passport table schema, but not the script that adds the test dataset. Then mount that db into the docker container, and use a config file to point the app to the non-default db. This will wire the app to a db with the correct schema, but no data/visas in it. @emre-f can then develop a script to make administrative POST requests to the passport broker to create the visas, and assign them to the researcher/user.
- What is the Header field name for adding passport jwt to Drs request? Can you provide an example request?
It's in the POST
request body:
{
"passports": [
"{jwt}"
]
}
So a field called passports
that holds an array of one or more passport JWTs. You should only need to provide one since we're not demonstrating multi-broker.
- Does running python3 scripts/add-known-visas-to-drs.py and python3 scripts/populate-drs.py add the 1000 genome sample population-based visa information and 1000 genome sample data to Drs server? Is there a correct sequence to run these 2 scripts?
add-known-visas-to-drs should be run first, followed by populate-drs.
- What is the conclusion step of session 5? Does it end with the Drs returning the s3 URL for the requested Drs object and the participant being able to access this s3 URL to download a file?
Yeah that's a good place to end it. I don't think it's necessary to run the workflow via WES again, you could just state that "now that we can access controlled access DRS Objects using our passport, we are technically ready to run the workflow as we did in Session 4"
- Since, we use DRS in both session 4 and session 5, and only the Drs in session 5 uses passports for authorization. How is this configured?
DRS in session 4 makes no use of the passport/visa related tables, so it can be completely ignored in session 4. session 5 makes use of the passport/visa tables and assigns DRSObjects to Visas, which essentially states, "for this DRSObject, the researcher needs to present this visa to obtain access". The wiring of DRSObjects to required visas is handled by the 2 python scripts.
- For the data connect 1000 genomes table, we took a subset based on the rules below, which resulted in 200 rows. I see that the Drs database has 46 rows of 1000 genome samples in it. Is there a different set of rules that were used to obtain this subset? If yes, then let me know the rules used, so I will make sure that data connect uses the same subset.
If you look at the S3 bucket I made for the tutorial, there are 2 directories of CRAMs. I believe they are "highcov" and "lowcov." The CRAMs/DRS IDs in highcov should be used for session 4, because that's what I used to test CNest. In Session 5, you can use the data in lowcov. The data in lowcov should be 200 CRAMs and CRAIs that correspond to the data connect dataset.
The script in session 5 should populate DRS with 600 DRS Objects (1 CRAM, 1 CRAI, 1 Bundle), as well as a FASTA file, index, BED file.
Thank you very much @jb-adams
Hi @jb-adams I am trying to change the passport issuer from "https://ga4gh.org/" to something else. I wanted to check with you if I can use a config file for the passport-broker container to do this.
Here is the config.yaml file, but it doesn't work as expected.
passport-broker:
brokerProps:
passportIssuer: http://localhost:4455/
Can you please take a look and see if I am doing something incorrectly? Thanks!
I was able to figure out the error in the config file. Closing this issue as session 5 content is good to go!! Thank you, Ian and Jeremy!
/search
on data connect server which provides the Drs uri for cram, crai, and bundle, what is the next step? Do we take these URI's and go to Drs server?python3 scripts/add-known-visas-to-drs.py
andpython3 scripts/populate-drs.py
add the 1000 genome sample population-based visa information and 1000 genome sample data to Drs server? Is there a correct sequence to run these 2 scripts?26 unique records in population_code {'ITU', 'ASW', 'JPT', 'MSL', 'CHS', 'CDX', 'YRI', 'ACB', 'MXL', 'PUR', 'FIN', 'GWD', 'LWK', 'GIH', 'CLM', 'TSI', 'PEL', 'PJL', 'GBR', 'CHB', 'BEB', 'ESN', 'KHV', 'CEU', 'IBS', 'STU'}
30 unique records in population_name {'Bengali,Bengali', 'African Ancestry SW', 'Punjabi', 'Dai Chinese', 'Gambian Mandinka', 'Yoruba', 'British', 'Japanese', 'Iberian', 'African Caribbean', 'Mende', 'Southern Han Chinese', 'Han Chinese', 'Luhya', 'Kinh,Kinh Vietnamese', 'Toscani', 'Luhya,Luhya', 'Kinh Vietnamese', 'Tamil', 'Gujarati', 'Bengali', 'Finnish', 'CEPH', 'Telugu', 'Peruvian', 'Esan', 'Colombian', 'Punjabi,Punjabi', 'Puerto Rican', 'Mexican Ancestry'}
5 unique records in superpopulation_code {'AMR', 'EAS', 'EUR', 'SAS', 'AFR'}
8 unique records in superpopulation_name {'East Asian Ancestry', 'European Ancestry', 'African Ancestry', 'American Ancestry', 'East Asia (SGDP),East Asian Ancestry', 'South Asian Ancestry', 'South Asia (SGDP),South Asian Ancestry', 'African Ancestry,Africa (SGDP)'}