Open diekhans opened 5 years ago
I agree. I have long wondered why we attempt to combine an end-user tool that needs to be user-friendly with a shim layer that presents an entire API as a CLI. It seems like there should be two tools.
what commands would you like to see in a true "consumer" api?
A query command, download load command; without having to understand the internals of the system. Just look at the various help messages to see how the HCA command is designed for the developer.
You have to know what DSS, bundle, swagger, indexing, is to understand the command.
After reading through the data browser charter, the existence of a user-centric CLI and API belong to the data browser. Therefore, I a passing this hot potato to @theathorn ..
@hannes-ucsc Opinions?
The hca command is designed for the developers of the DCP, not end users.
"End user" is an ill-defined term. It would help to be more specific. The CLI currently has a few commands destined for non-developers but those are crowded out by the API shims which are designed for interactive use by developers, DCP and external alike. Keep in mind that the REST APIs are the main interface within the DCP and between the DCP and external developers.
It seems like there should be two tools.
I agree that we should have more higher level commands and do a better job at promoting those in the documentation as well as through improved naming. The general pattern of exposing a REST API through a CLI shim is fairly commonplace. So is providing higher level convenience commands on top of the bare-bone shims. Prominent example: the AWS CLI has aws s3
and aws s3api
.
We don't need to publish two separate PyPI distributions to provide a convenience API for less experienced users. One distribution can contain multiple executables. Each executable can contain multiple commands. I can elaborate on the advantage of a single distribution if necessary.
The hca command is designed for the developers of the DCP, not end users.
"End user" is an ill-defined term. It would help to be more specific.
"End user" -A biologist with a computer. Not an application developer, who can and should understand the API.
The CLI currently has a few commands destined for non-developers but those are crowded out by the API shims which are designed for interactive use by developers, DCP and external alike. Keep in mind that the REST APIs are the main interface within the DCP and between the DCP and external developers.
It seems like there should be two tools.
I agree that we should have more higher level commands and do a better job at promoting those in the documentation as well as through improved naming.
The general pattern of exposing a REST API through a CLI shim is fairly commonplace. So is providing higher level convenience commands on top of the bare-bone shims. Prominent example: the AWS CLI has
aws s3
andaws s3api
.
I wouldn't suggest getting rid of this command or trying to make it more friendly to people who are not comfortable with the API. I suspect it will get more sophisticated as the API grows.
We don't need to publish two separate PyPI distributions to provide a convenience API for less experienced users. One distribution can contain multiple executables. Each executable can contain multiple commands. I can elaborate on the advantage of a single distribution if necessary.
I completely agree. We don't need more pypi packages, that is too confusing. All I am suggesting is another comand in the hca package.
"End user" -A biologist with a computer. Not an application developer, who can and should understand the API.
Still ill-defined. Show me the biologist that doesn't have a computer. Also, only a fraction of people with a computer are comfortable with a CLI.
All I am suggesting is another command in the hca package.
I think it's time to start listing what DCP functionality would be exposed by that higher level command.
Still ill-defined. Show me the biologist that doesn't have a computer. Also, only a fraction of people with a computer are comfortable with a CLI.
That is what the UX people call them. I would describe it as more someone who's code you never want to actually used.
All I am suggesting is another command in the hca package. I think it's time to start listing what DCP functionality would be exposed by that higher level command.
YES!!
I'm an end user now. EBI's Expression Atlas grabs fastq to perform it's own analysis. Other archives offer http or ftp links for single file downloads. As the DCP doesn't offer this we're happy to convert the file uuid into some sort of shell command to get the file with the hca-cli. However, I can't work out how to do this without negotiating bundle logic via the 'download' command.
What I mean by some download logic:
We have a manifest of file uuids to check we have downloaded everything. I could convert this list of file uuids to bundle uuids, then unique them, then use the hca dss download
function with --data-files
to get all the files, then map filename to file uuid and then check I got them all by filename/checksum in the local download directory? I'm not sure if you see a clearer way to do this at the moment? Even without the cli?
@hewgreen
Other archives offer http or ftp links for single file downloads. As the DCP doesn't offer this
That's a serious omission in functionality. Have you filed a ticket for that? If not, I'd file it against this project.
You can download an individual file like this:
hca dss get-file --uuid 3ba0be0a-65de-407d-8160-7e88fad9ccb2 --version 2019-05-16T093249.721488Z --replica aws > test.pdf
What I don't understand is why you can't use hca dss download-manifest
directly? What's missing?
Need actionable items to work on.
The
hca
command is designed for the developers of the DCP, not end users. It requires a detailed understanding of the internals of the DCP. Most options and their descriptions make no sense to consumers.Another command could be created that is actually designed around the end users.
For example, what would a consumer of the cli make of this help message??