Closed jnpaulson closed 9 years ago
note: can't build vignette without query.rdata
You should be able to build the vignette with mgQuery.
For the unnamed DNAStringObjects error. The sequence names are used creating the metagenomeAnnotation object. Does it seem reasonable to add names to unnamed DNAStringObjects using the sequence index positions, or explicitly require named DNAStringObjects?
Thanks. First try to build with updated package led to
Quitting from lines 27-32 (Example_16S_Annotation_Workflow.Rmd) Error in data.frame(query_id = names(query_subset), Keys = db_subset, : object 'query_subset' not found
Tried to run on source after purl and that succeeded. Rendered the example Rmd again in the same session and
split_by(MgAnnoDF, "Phylumn") returned
list()
Typo in Phylum, it should work now.
@vjcitn - at least on my machine things are working now. Is everything running smoothly for you as well?
i think they are close. recent attempt
Error : .onLoad failed in loadNamespace() for 'greengenes13.5MgDb', details:
call: sqliteSendQuery(con, statement, bind.data)
error: rsqlite_query_send: could not execute1: cannot rollback - no transaction is active
Error: loading failed
recover called non-interactively; frames dumped, use debugger() to view
%vjcair>
i think it is a memory limitation... but this should not be happening. you should not
be doing heavy computation in .onLoad IMHO. there's a database somewhere, check
for its existence, establish a connection, and go away, have functions that establish
any content-related entities like dplyr tables or whatever that the user must invoke
to move forward. i don't know why the onLoad is taking so long.
if the database doesn't exist, die with message on how to create it.
On Wed, Aug 5, 2015 at 11:46 AM, Joseph N. Paulson <notifications@github.com
wrote:
@vjcitn https://github.com/vjcitn - at least on my machine things are working now. Is everything running smoothly for you as well?
— Reply to this email directly or view it on GitHub https://github.com/HCBravoLab/metagenomeFeatures/issues/1#issuecomment-128045808 .
I moved the dowloading and building the database outside of onLoad with a function getGreenGenes13.5MgDb() to build the database. However the gg13.5MgDb object is no longer loaded into namespace. Do you have ayn recommendations for how to add gg13.5MgDb to the namespace so that is it available when the library is attached? The changes are in the onLoad_fix of greengenes13.5MgDb (https://github.com/HCBravoLab/greengenes13.5MgDb/tree/onLoad_fix).
Why not just put the large files that you need in (an) experimental data package(s), and depend on it (them)? This way you can have a vignette describing the content and any maintenance considerations in that experimental data package (one per microbiome database you want to deal with). This could be a placeholder approach until AnnotationHub is available to serve the file ... in this case, your onLoad could check for the material in the AnnotationHub cache and fail if it is not there, directing the user to use a given AnnotationHub command before trying again.
If these contents are too volatile to be packaged in this way it becomes perhaps more like a biomaRt situation. AnnotationHub might still be relevant but I don't know that it is designed to deal with very volatile content. In fact part of Bioconductor's core principle is that annotation should freeze for 6 months or so anyway to give analysts some stability.
In any event, I am no authority on how to deal with large annotation resources. Herve and others in the community have addressed it in different ways. But I do think having long silent onLoad events (core seems to want load to be silent, so you can't notify the user about what is going on) is to be avoided. Why not email biocore for more input?
On Thu, Aug 6, 2015 at 11:40 PM, nate-d-olson notifications@github.com wrote:
I moved the dowloading and building the database outside of onLoad with a function getGreenGenes13.5MgDb() to build the database. However the gg13.5MgDb object is no longer loaded into namespace. Do you have ayn recommendations for how to add gg13.5MgDb to the namespace so that is it available when the library is attached? The changes are in the onLoad_fix of greengenes13.5MgDb ( https://github.com/HCBravoLab/greengenes13.5MgDb/tree/onLoad_fix).
— Reply to this email directly or view it on GitHub https://github.com/HCBravoLab/metagenomeFeatures/issues/1#issuecomment-128581257 .
Just finised updating the greengenes13.5MgDb onLoad so that is no longer downloads the data. The data will be included in the package once it is in bioconductor but in the mean time, a script in inst/scripts can be used to download the data. Instruction for installing the package are provided on the github site. Let me know if you run into any problems. https://github.com/HCBravoLab/greengenes13.5MgDb
Not having a
query.rdata
file I generated my own and ran the pipeline only to get an error if I did not includenames
for my DNAStringSet object (whereas in ShortRead it appears to be replaced by ids). Is that expected that includes the ids and named DNAStringSet?