chapmanb / bcbio.variation

Toolkit to analyze genomic variation data, built on the GATK with Clojure
66 stars 15 forks source link

Consistent API for file retrieval #5

Closed lynaghk closed 12 years ago

lynaghk commented 12 years ago

I'm looking into hooking up the filtering functionality with the VCF frontend. Right now you have a nicely exposed plot-ready-metrics function in the metrics API that will grab the file in the background. The do-analysis multimethods have no such convenience---are you working on some unified code/backend (datomic?) to deal with these concerns? We should sync up on this before the VCF starts reaching out into analyses in bcbio if the latter are going to change up in the next few weeks.

chapmanb commented 12 years ago

Kevin; The do-analysis calls should do things the same way. It starts by using the same file API call:

https://github.com/chapmanb/bcbio.variation/blob/master/src/bcbio/variation/api/run.clj#L20

This will bring GenomeSpace files locally (into a cache dir) and just return files that are already local. So if you pass an ID string like gs:/Home/your_gs_user/whatever it should do the right thing. Let me know if you're running into problems with it and happy to refactor to make it consistent.

The backend still needs work but hopefully that shouldn't mess with calling the API.

lynaghk commented 12 years ago

I tried

(def files (get-files :vcf creds))
(do-analysis :filter
             {:filename (:id (first files))
              :metrics {:dp [0 100]}}
             creds)

and got The fasta file you specified (/Users/kevin/work/harvard/vcf) does not exist. [Thrown class org.broadinstitute.sting.utils.exceptions.UserException]

chapmanb commented 12 years ago

Kevin; That's a GATK exception complaining about the FASTA input file. That's specified in the web config YAML:

https://github.com/chapmanb/bcbio.variation/blob/master/config/web-processing.yaml#L8

and should be a local file since that's part of the server configuration. Hope that fixes it.

lynaghk commented 12 years ago

Ah, got it. I've added a similar config file to the VCF repo and the default FASTA genome and things appear to be running fine now:

https://github.com/lynaghk/vcf/commit/5cce7995acc402d2ab9d2044b65607f9f0930e3a