MicrosoftDocs / azure-docs

Open source documentation of Microsoft Azure
https://docs.microsoft.com/azure
Creative Commons Attribution 4.0 International
10.19k stars 21.33k forks source link

problems downloading ClinVar dataset #99031

Open smrgit opened 1 year ago

smrgit commented 1 year ago

For various reasons, I have not been able to get set up with the azureml python package, and instead have been trying to use azcopy. Although I am able to use azcopy to download the md5 file, I have not been able to do a listing of the container/folder or download the main XML file.

Specifically, this command works: azcopy cp "https://datasetclinvar.blob.core.windows.net/dataset/ClinVarFullRelease_00-latest.xml.gz.md5?sv=2019-02-02&se=2050-01-01T08%3A00%3A00Z&si=prod&sr=c&sig=qFPPwPba1RmBvaffkzkLuzabYU5dZstSTgMwxuLNME8%3D" ClinVarFullRelease_00-latest.xml.gz.md5

but this one does not: azcopy cp "https://datasetclinvar.blob.core.windows.net/dataset/ClinVarFullRelease_00-latest.xml.gz?sv=2019-02-02&se=2050-01-01T08%3A00%3A00Z&si=prod&sr=c&sig=qFPPwPba1RmBvaffkzkLuzabYU5dZstSTgMwxuLNME8%3D" ClinVarFullRelease_00-latest.xml.gz

and neither does "azcopy ls"

I'm also wondering why this container requires a SAS token (for a completely open-access dataset), whereas, for example, the gnomAD dataset does not. It seems like the SAS token details are preventing me from being able to do a listing of the container to identify the specific XML file(s) of interest.

I also wanted to ask if you do any security-scanning of this data in the process of mirroring it every day?


Document Details

Do not edit this section. It is required for learn.microsoft.com ➟ GitHub issue linking.

SaibabaBalapur-MSFT commented 1 year ago

@smrgit Thanks for your feedback! We will investigate and update as appropriate.

CreRecombinase commented 1 month ago

Any update? The permissions for clinvar are still broken.