Bioconductor / GoogleGenomics

[DEPRECATED] An R package for Google Genomics API queries.
Apache License 2.0
44 stars 23 forks source link

Restructured genomic coordinate conversion handling for variants. #19

Closed deflaux closed 10 years ago

deflaux commented 10 years ago

Moved that logic into the converters.

Also:

pgrosu commented 10 years ago

Hi Nicole,

I really like the new features. I believe it still requires GoogleGenomics::authenticate("...") to authenticate rather than just authenticate("..."). Just a few minor things:

1) For some of the examples in Working with Variants to work, I think some of the following packages require installation beforehand:

source("http://bioconductor.org/biocLite.R")
biocLite("TxDb.Hsapiens.UCSC.hg19.knownGene")
biocLite("BSgenome.Hsapiens.UCSC.hg19")
biocLite("org.Hs.eg.db")

2) In Working with Reads probably the x-label foo below, might be easier to read as Position on Chromosome 13:

coveragePlot <- ggplot(as(alignments, 'GRanges')) + stat_coverage(color="gray40", fill="skyblue")
tracks(alignmentPlot, coveragePlot, xlab="foo")

to

coveragePlot <- ggplot(as(alignments, 'GRanges')) + stat_coverage(color="gray40", fill="skyblue")
tracks(alignmentPlot, coveragePlot, xlab="Position on Chromosome 13")

3) As Tim noticed previously in issue https://github.com/googlegenomics/api-client-r/issues/12, the shiny still points to an old ReadSet ID:

http://googlegenomics.shinyapps.io/reads

Maybe it could be updated with CMvnhpKTFhD04eLE-q2yxnU.

Thanks for all the great additions! ~p

ttriche commented 10 years ago

1 can be addressed via

Depends: Homo.sapiens, BSgenome.Hsapiens.UCSC.hg19

I'll generate a pull request.

--t

On Oct 31, 2014, at 12:39 AM, Paul Grosu notifications@github.com wrote:

Hi Nicole,

I really like the new features. I believe it still requires GoogleGenomics::authenticate("...") to authenticate rather than just authenticate("..."). Just a few minor things:

1) For some of the examples in Working with Variants to work, I think some of the following packages require installation beforehand:

source("http://bioconductor.org/biocLite.R") biocLite("TxDb.Hsapiens.UCSC.hg19.knownGene") biocLite("BSgenome.Hsapiens.UCSC.hg19") biocLite("org.Hs.eg.db") 2) In Working with Reads probably the x-label foo below, might be easier to read as Position on Chromosome 13:

coveragePlot <- ggplot(as(alignments, 'GRanges')) + stat_coverage(color="gray40", fill="skyblue") tracks(alignmentPlot, coveragePlot, xlab="foo") to

coveragePlot <- ggplot(as(alignments, 'GRanges')) + stat_coverage(color="gray40", fill="skyblue") tracks(alignmentPlot, coveragePlot, xlab="Position on Chromosome 13") 3) As Tim noticed previously in issue #12, the shiny still points to an old ReadSet ID:

http://googlegenomics.shinyapps.io/reads

Maybe it could be updated with CMvnhpKTFhD04eLE-q2yxnU.

Thanks for all the great additions! ~p

— Reply to this email directly or view it on GitHub.

pgrosu commented 10 years ago

Great! I don't believe Nicole merged this yet, so to just get this specific pull I performed the following:

devtools::install_github("googlegenomics/api-client-r#19")
deflaux commented 10 years ago

@ttriche and @pgrosu thanks so much for the review!

Do you think the vignettes are okay now with the added install instructions for the packages not currently part of the GoogleGenomics R package? Since annotation dbs can be quite large, I was thinking the package should not depend upon them. People can read the sample and optionally install the dbs to run the sample if they wish.

By the way, we have a new references API and at some point it may be interesting to make use of that.

ttriche commented 10 years ago

I'll look into the references API, that could be an excellent step forward.

As far as annotation DBs go, I'd require Homo.sapiens and put BSgenome.* under Suggests:

I recently decoupled one of the packages I maintain from the BSgenome.* packages for exactly the same reason: they're huge.

GAlignment* has (I think) been moved into the GenomicAlignments package. I'll look and see what's going on there.

--t

On Oct 31, 2014, at 11:24 AM, Nicole Deflaux notifications@github.com wrote:

@ttriche and @pgrosu thanks so much for the review!

Do you think the vignettes are okay now with the added install instructions for the packages not currently part of the GoogleGenomics R package? Since annotation dbs can be quite large, I was thinking the package should not depend upon them. People can read the sample and optionally install the dbs to run the sample if they wish.

By the way, we have a new references API and at some point it may be interesting to make use of that.

— Reply to this email directly or view it on GitHub.

pgrosu commented 10 years ago

Nicole, thank you and I'm always happy to help out :) Regarding the reference API, I would love to help out in integrating with the API but we would need to do some testing together. If it would be possible to have access to an unbilled, temporary test instances of GCE and GCS, that would would help. This would give us a standardized platform to test from together. If you noticed from the analysis performed in issue https://github.com/googlegenomics/utils-java/issues/9, we found that the bandwidth is limiting via the API, but there other ways we can speed it up. The API has amazing possibilities here that are still untapped.

Regarding the vignettes, you will notice that as more features get added a lot of things will slow down. The long-term scope of the API is to hopefully is to perform data analysis across multiple datasets efficiently. In order to approach that longer-term goal, some of the fundamental approaches in the way the data is being retrieved and processed can be further explored and expanded.

Tim is right, in that these packages huge, but they can be modularized, pre-cached and even enhanced. There are several ways of looking at annotating information (i.e. gene->information, information->gene, range subset structures, etc.) and if the data is restructured correctly with caching - just like Google searches - it could speed things up quite a bit. There is a whole area of Information Retrieval that would help here.

Tim, many thanks for helping out here. I think there are a lot of nice possibilities we can experiment with here.

Have a great weekend and Happy Halloween :) ~p

cassiedoll commented 9 years ago

With regards to test instances of GCE and GCS - the new free trial program is the best way to do this now: https://cloud.google.com/free-trial/index#FAQ

pgrosu commented 9 years ago

Thanks Cassie for the link and information, but the trial GCE and GCS would not be just for me. It would shared among all of us to troubleshoot the API in order to improve the throughput which will help not just users but Google as well. I'm not sure we can fix all the issues in just 60 days.

cassiedoll commented 9 years ago

I'm sorry Paul - I don't think we have the ability to provide something like that at this time. We'll let you know if that changes though.

pgrosu commented 9 years ago

Thank you for looking into it Cassie, and appreciate it if you could let me know in case things change. I think we have ways to improve the API by orders of magnitude based on the results of our analysis performed in https://github.com/googlegenomics/utils-java/issues/9, and working together given our diverse expertise will bring us quicker to some viable possibilities, since I feel we are very close.