HumanCellAtlas / dcp-cli

DEPRECATED - HCA Data Coordination Platform Command Line Interface
https://hca.readthedocs.io/
MIT License
6 stars 8 forks source link

hca cli is not designed for DCP consumers #345

Open diekhans opened 5 years ago

diekhans commented 5 years ago

The hca command is designed for the developers of the DCP, not end users. It requires a detailed understanding of the internals of the DCP. Most options and their descriptions make no sense to consumers.

Another command could be created that is actually designed around the end users.

For example, what would a consumer of the cli make of this help message??

usage: hca dss patch-bundle [-h] --uuid UUID --replica {aws,gcp} --version VERSION
                            [--add-files ADD_FILES] [--remove-files REMOVE_FILES]

Add or remove files from a bundle. A specific version of the bundle to update must be
provided, and a new version will be written. Bundle manifests exceeding 20,000 files will not
be included in the Elasticsearch index document.
sampierson commented 5 years ago

I agree. I have long wondered why we attempt to combine an end-user tool that needs to be user-friendly with a shim layer that presents an entire API as a CLI. It seems like there should be two tools.

kozbo commented 5 years ago

what commands would you like to see in a true "consumer" api?

diekhans commented 5 years ago

A query command, download load command; without having to understand the internals of the system. Just look at the various help messages to see how the HCA command is designed for the developer.

You have to know what DSS, bundle, swagger, indexing, is to understand the command.

diekhans commented 5 years ago

After reading through the data browser charter, the existence of a user-centric CLI and API belong to the data browser. Therefore, I a passing this hot potato to @theathorn ..

theathorn commented 5 years ago

@hannes-ucsc Opinions?

hannes-ucsc commented 5 years ago

The hca command is designed for the developers of the DCP, not end users.

"End user" is an ill-defined term. It would help to be more specific. The CLI currently has a few commands destined for non-developers but those are crowded out by the API shims which are designed for interactive use by developers, DCP and external alike. Keep in mind that the REST APIs are the main interface within the DCP and between the DCP and external developers.

It seems like there should be two tools.

I agree that we should have more higher level commands and do a better job at promoting those in the documentation as well as through improved naming. The general pattern of exposing a REST API through a CLI shim is fairly commonplace. So is providing higher level convenience commands on top of the bare-bone shims. Prominent example: the AWS CLI has aws s3 and aws s3api.

We don't need to publish two separate PyPI distributions to provide a convenience API for less experienced users. One distribution can contain multiple executables. Each executable can contain multiple commands. I can elaborate on the advantage of a single distribution if necessary.

diekhans commented 5 years ago

The hca command is designed for the developers of the DCP, not end users.

"End user" is an ill-defined term. It would help to be more specific.

"End user" -A biologist with a computer. Not an application developer, who can and should understand the API.

The CLI currently has a few commands destined for non-developers but those are crowded out by the API shims which are designed for interactive use by developers, DCP and external alike. Keep in mind that the REST APIs are the main interface within the DCP and between the DCP and external developers.

It seems like there should be two tools.

I agree that we should have more higher level commands and do a better job at promoting those in the documentation as well as through improved naming.

The general pattern of exposing a REST API through a CLI shim is fairly commonplace. So is providing higher level convenience commands on top of the bare-bone shims. Prominent example: the AWS CLI has aws s3 and aws s3api.

I wouldn't suggest getting rid of this command or trying to make it more friendly to people who are not comfortable with the API. I suspect it will get more sophisticated as the API grows.

We don't need to publish two separate PyPI distributions to provide a convenience API for less experienced users. One distribution can contain multiple executables. Each executable can contain multiple commands. I can elaborate on the advantage of a single distribution if necessary.

I completely agree. We don't need more pypi packages, that is too confusing. All I am suggesting is another comand in the hca package.

hannes-ucsc commented 5 years ago

"End user" -A biologist with a computer. Not an application developer, who can and should understand the API.

Still ill-defined. Show me the biologist that doesn't have a computer. Also, only a fraction of people with a computer are comfortable with a CLI.

All I am suggesting is another command in the hca package.

I think it's time to start listing what DCP functionality would be exposed by that higher level command.

diekhans commented 5 years ago

Still ill-defined. Show me the biologist that doesn't have a computer. Also, only a fraction of people with a computer are comfortable with a CLI.

That is what the UX people call them. I would describe it as more someone who's code you never want to actually used.

All I am suggesting is another command in the hca package. I think it's time to start listing what DCP functionality would be exposed by that higher level command.

YES!!

hewgreen commented 5 years ago

I'm an end user now. EBI's Expression Atlas grabs fastq to perform it's own analysis. Other archives offer http or ftp links for single file downloads. As the DCP doesn't offer this we're happy to convert the file uuid into some sort of shell command to get the file with the hca-cli. However, I can't work out how to do this without negotiating bundle logic via the 'download' command.

What I mean by some download logic: We have a manifest of file uuids to check we have downloaded everything. I could convert this list of file uuids to bundle uuids, then unique them, then use the hca dss download function with --data-files to get all the files, then map filename to file uuid and then check I got them all by filename/checksum in the local download directory? I'm not sure if you see a clearer way to do this at the moment? Even without the cli?

hannes-ucsc commented 5 years ago

@hewgreen

Other archives offer http or ftp links for single file downloads. As the DCP doesn't offer this

That's a serious omission in functionality. Have you filed a ticket for that? If not, I'd file it against this project.

You can download an individual file like this:

hca dss get-file --uuid 3ba0be0a-65de-407d-8160-7e88fad9ccb2 --version 2019-05-16T093249.721488Z --replica aws > test.pdf

What I don't understand is why you can't use hca dss download-manifest directly? What's missing?

theathorn commented 5 years ago

Need actionable items to work on.