Bioconductor / Contributions

Contribute Packages to Bioconductor
134 stars 33 forks source link

PSIchomics #139

Closed nuno-agostinho closed 8 years ago

nuno-agostinho commented 8 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 8 years ago

Hi @nuno-agostinho

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: psichomics
Title: Analyse and Visualise Alternative Splicing Data
Version: 0.99.0
Authors@R: c(
    person("Nuno", "Saraiva Agostinho", , "nunodanielagostinho@gmail.com", 
 role = c("aut", "cre")),
    person("Nuno", "Barbosa Morais", role=c("aut", "led", "ths")),
    person("Andr\u00E9", "Falc\u00E3o", role="ths"),
    person("Lina", "Gallego Paez", role="ctb"),
    person("Marie", "Bordone", role="ctb"),
    person("Teresa", "Maia", role="ctb"),
    person("Mariana", "Ferreira", role="ctb"),
    person("Ana Carolina", "Leote", role="ctb"))
Description: Automatically retrieve data from RNA-Seq sources such as The Cancer
    Genome Atlas or load your own files and process the data. This tool allows
    you to analyse and visualise alternative splicing.
Depends:
    R (>= 3.2),
    shiny (>= 0.14),
    shinyBS
License: MIT + file LICENSE
LazyData: true
RoxygenNote: 5.0.1
Imports:
    data.table,
    digest,
    DT (>= 0.2),
    fastmatch,
    highcharter (>= 0.4.0),
    httr,
    jsonlite,
    miscTools,
    plyr,
    R.utils,
    rlist,
    shinyjs,
    stats,
    survival,
    Sushi,
    tools,
    utils,
    XML
Suggests:
    testthat,
    knitr,
    parallel,
    devtools,
    rmarkdown,
    dplyr,
    gplots,
    stringr,
    covr,
    car
VignetteBuilder: knitr
Collate:
    'analysis.R'
    'analysis_diffSplicing.R'
    'analysis_diffSplicing_event.R'
    'analysis_diffSplicing_table.R'
    'analysis_information.R'
    'analysis_pca.R'
    'analysis_survival.R'
    'analysis_template.R'
    'utils.R'
    'globalAccess.R'
    'app.R'
    'data.R'
    'formats.R'
    'data_firebrowse.R'
    'data_inclusionLevels.R'
    'data_local.R'
    'events_suppa.R'
    'events_vastTools.R'
    'events_miso.R'
    'events_mats.R'
    'events.R'
    'formats_firehoseGeneExpression.R'
    'formats_firehoseJunctionQuantification.R'
    'formats_firehoseMergeClinical.R'
    'groups.R'
    'settings.R'
biocViews: Sequencing, RNASeq, AlternativeSplicing, DifferentialSplicing,
    Transcription, GUI, PrincipalComponent, Survival, BiomedicalInformatics,
    Transcriptomics, Visualization, MultipleComparison
URL: https://github.com/nuno-agostinho/psichomics
BugReports: https://github.com/nuno-agostinho/psichomics/issues
nuno-agostinho commented 8 years ago

I would like to note this package currently contains an annotation file of around 8.5 MB. I have been discussing with @vobencha and @mtmorgan about moving the annotation file to an annotation package. In the meantime, @mtmorgan agreed on submitting the main package here.

bioc-issue-bot commented 8 years ago

Your package has been approved for building. Your package is now submitted to our queue.

IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.

bioc-issue-bot commented 8 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/psichomics_buildreport_20160923170017.html

lawremi commented 8 years ago

The parsing based on format objects is interesting. Why not use formal classes? Then you could just look for subclasses of the base class to find the available formats, instead of listing the namespace. It would be more extensible.

nuno-agostinho commented 8 years ago

@lawremi With formal classes, I would have to create a class, an object for that class, a subclass and an object for that subclass, right? I thought that using the namespace to look for functions with a given attribute was easier.

By the way, do you know how to fix the error in the Windows build? It reports Error : package 'shiny' 0.13.2 was found, but >= 0.14 is required by 'psichomics'.

lawremi commented 8 years ago

Formal classes give you a lot more than that. Easier validity checking, enhanced extensibility, self-documenting structure. Etc.

bioc-issue-bot commented 8 years ago

We only start builds when the Version field in the DESCRIPTION file is incremented. For example, by changing

Version: 0.99.0

to

Version 0.99.1

If you did not intend to start a build, you don't need to do anything. If you did want to start a build, increment the Version: field and try again.

bioc-issue-bot commented 8 years ago

We only start builds when the Version field in the DESCRIPTION file is incremented. For example, by changing

Version: 0.99.0

to

Version 0.99.1

If you did not intend to start a build, you don't need to do anything. If you did want to start a build, increment the Version: field and try again.

nuno-agostinho commented 8 years ago

@lawremi I understand that formal classes have their advantages. What I don't understand is:

Then you could just look for subclasses of the base class to find the available formats, instead of listing the namespace.

I currently list the namespace to find and load functions with a given attribute. If I created classes and subclasses, I would need to instantiate them. Thus, the resulting objects would contain the string relative to the loading function. However, how would I gather the objects from a given class? The only way I can come up with is by listing the namespace and iteratively checking the class of the returned elements (which I think you prefer to avoid).

lawremi commented 8 years ago

Sorry, I just meant to represent each format as a class and list the subclasses, i.e., names(getClass("Format")@subclasses). The prototype of the class could configure the slots, so new(Class) would do the right thing. Maybe it's not such a great idea to have a class for each format though. Depends on the level of flexibility needed. You could implement some sort of registry of instances but that might not be that beneficial compared to the current solution.

bioc-issue-bot commented 8 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/psichomics_buildreport_20160926091642.html

lshep commented 8 years ago

Hello,

I was able to run through your example for running GUI tutorial. When I try to run the CLI tutorial, I am running into some problems. I am not able to download files to begin to run the application.

> data <- loadFirehoseData(folder=folder,
+                          cohort="ACC",
+                          data=c("Clinical", "junction_quantification"),
+                          date="2016-01-28")
1
Triggered the download of files
Error in download.file(missingFiles, destfile = file.path(folder, basename(missingFiles))) : 
  'url' must be a length-one character vector
lawremi commented 8 years ago

Maybe you could drop the dependency on the rlist package by doing split(df, seq_len(nrow(df))) instead of list.parse(df)? The direct code seems easier understand.

bioc-issue-bot commented 8 years ago

Received a valid push; starting a build. Commits are:

3c39d17 Fix error when differentially analysing one row 1d7e227 Improve installation instructions 18939b4 Remove dependency on rlist and improve function do... 428d459 Use RTOOLS in AppVeyor Merge branch 'master' of g...

nuno-agostinho commented 8 years ago

@lshep I don't know why that error happens. Would you mind trying to run that function once again? If the error keeps going on, could you say which operative system you are running it on and the version of utils you have installed? Thanks in advance.

@lawremi The output from your suggestion and list.parse(df) is different. I replaced that call with lapply(seq(nrow(df)), function(i) as.list(df[i, ])) instead.

bioc-issue-bot commented 8 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/psichomics_buildreport_20160930175341.html

lawremi commented 8 years ago

Ok well one thing you want to be careful of is the single column case. Might want to pass drop=FALSE to [().

nuno-agostinho commented 8 years ago

@lawremi Thanks for the reminder! I will wait for @lshep to warn about any more issues before sending a new commit with that fixed.

By the way, do you know why this build was just labelled with warnings when the output of the build before was labelled OK, even though the output of both is the same?

lshep commented 8 years ago

I am still getting the same ERROR when i try to run the CLI tutorial.

library(psichomics)
cohorts <- getFirehoseCohorts()
date <- getFirehoseDates()
dataTypes <- getFirehoseDataTypes()
folder <- getDownloadsFolder()
data <- loadFirehoseData(folder=folder,
                         cohort="ACC",
                         data=c("Clinical", "junction_quantification"),
                         date="2016-01-28")
1
Triggered the download of files
Error in download.file(missingFiles, destfile = file.path(folder, basename(missingFiles))) : 
  'url' must be a length-one character vector

My sessionInfo()

> sessionInfo()
R version 3.3.1 Patched (2016-09-25 r71359)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] psichomics_0.99.1 shinyBS_0.61      shiny_0.14       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.7          lubridate_1.6.0      lattice_0.20-34     
 [4] tidyr_0.6.0          zoo_1.7-13           assertthat_0.1      
 [7] digest_0.6.10        psych_1.6.9          mime_0.5            
[10] R6_2.1.3             plyr_1.8.4           chron_2.3-47        
[13] stats4_3.3.1         RSQLite_1.0.0        httr_1.2.1          
[16] miscTools_0.6-16     curl_2.1             data.table_1.9.6    
[19] miniUI_0.1.1         TTR_0.23-1           S4Vectors_0.11.17   
[22] R.utils_2.4.0        R.oo_1.20.0          Matrix_1.2-7.1      
[25] DT_0.2               splines_3.3.1        shinyjs_0.7         
[28] stringr_1.1.0        foreign_0.8-67       htmlwidgets_0.7     
[31] igraph_1.0.1         RCurl_1.95-4.8       biomaRt_2.29.2      
[34] broom_0.4.1          httpuv_1.3.3         BiocGenerics_0.19.2 
[37] mnormt_1.5-4         htmltools_0.3.5      tibble_1.2          
[40] IRanges_2.7.16       XML_3.98-1.4         viridisLite_0.1.3   
[43] dplyr_0.5.0          bitops_1.0-6         R.methodsS3_1.7.1   
[46] grid_3.3.1           nlme_3.1-128         jsonlite_1.1        
[49] xtable_1.8-2         DBI_0.5-1            magrittr_1.5        
[52] highcharter_0.4.0    rlist_0.4.6.1        quantmod_0.4-6      
[55] stringi_1.1.1        reshape2_1.4.1       xts_0.9-7           
[58] fastmatch_1.0-4      tools_3.3.1          Biobase_2.33.3      
[61] purrr_0.2.2          parallel_3.3.1       survival_2.39-5     
[64] AnnotationDbi_1.35.4 Sushi_1.11.0        
lshep commented 8 years ago

Two notes when I run the GUI version through Rstudio:

  1. There is no exit button and if I close the application manually I am still stuck. I have to do a hard esc to exit in Rstudio to exit.
  2. Some of the steps as expected take some time. I would make mention of the progress bar

In regards to the WARNING. There was a bug on our end that the WARNING tag would not get added for BiocCheck WARNINGS which has been corrected. If you look at the build reports, both do show these WARNING in BiocCheck

lshep commented 8 years ago

It looks like an argument url is multiple character strings/files to download, yet download.file appears to only take a single file as an argument. It is curious that this does not run into a problem when running the GUI version? Is the code different for downloading files when using the GUI vs using command line? why would this be the case and the same code not utilized?

analysis_information.R seems to have your email hard coded into arguments to query? Is there a reason for this and that it could not be the users email? How is this being utilized?

events.R also seems to have a lot of hard coded paths. How are these being created and utilized? It seems like files would not be found on a users system unless these are created as results through the application? If the later, this doesn't really seem like a good approach unless it is a user specified output directory, and should default to a tempdir()?

It seems like the default directory to download and write output is set through getDownloadsFolder. We feel like a better practice would be to default in this function to a tempdir() rather than a system directory. The user can always overwrite with a specific directory of their choosing.

bioc-issue-bot commented 8 years ago

Received a valid push; starting a build. Commits are:

4ce3abb Stop app when closing the browser tab/window a03be74 Fix download of multiple files when libcurl is not... b1a3191 Fix NULL comparison raising an error 871198b Fix error when using a single column

nuno-agostinho commented 8 years ago

Hey, @lshep!

I already solved some issues. Let me go through your points, one by one:

1

There is no exit button and if I close the application manually I am still stuck. I have to do a hard esc to exit in Rstudio to exit.

Given your input, I changed the program so that, after the user closes the tab/window of the GUI, the program will stop running in RStudio as well.

2

Some of the steps as expected take some time. I would make mention of the progress bar

I am sorry, I am not understanding your point. Do you mean that the progress bar should say something like "This may take a few minutes"?

3

It looks like an argument url is multiple character strings/files to download, yet download.file appears to only take a single file as an argument.

Actually, download.file can take multiple arguments if it uses libcurl, but all other methods do not allow for that. As I had libcurl as the default method in my computers, I never encountered the error until you mentioned it. The error is now fixed in the latest version and you can use it. Thanks for the help! :)

4

It is curious that this does not run into a problem when running the GUI version? Is the code different for downloading files when using the GUI vs using command line? why would this be the case and the same code not utilized?

Unfortunately, when downloading files, Shiny hangs if there are too many and/or too large files when using download.file. For that reason, the GUI version asks the browser to download the files instead.

5

analysis_information.R seems to have your email hard coded into arguments to query? Is there a reason for this and that it could not be the users email? How is this being utilized?

The RESTful service I use (called Entrez Programming Utilities) requires to send the e-mail of the developer. As written in here:

The value of email should be a complete and valid e-mail address of the software developer and not that of a third-party end user. The value of email will be used only to contact developers if NCBI observes requests that violate our policies, and we will attempt such contact prior to blocking access.

6

events.R also seems to have a lot of hard coded paths. How are these being created and utilized? It seems like files would not be found on a users system unless these are created as results through the application? If the later, this doesn't really seem like a good approach unless it is a user specified output directory, and should default to a tempdir()?

I am sorry. These functions are part of a section of the program we did not have time to implement in the GUI version. Anyway, I will clean it to be useful for the CLI version and send as the next version, as soon as possible.

7

It seems like the default directory to download and write output is set through getDownloadsFolder. We feel like a better practice would be to default in this function to a tempdir() rather than a system directory. The user can always overwrite with a specific directory of their choosing.

For the GUI version, as I said before, the downloads are handled by the user's browser which save files in the Downloads folder, by default. As such, I found it easier to point to the Downloads folder, by default. For the CLI version, there is no restriction and I can indeed point to a temporary directory. So, do you think I should direct it to tempdir() in the CLI version only?

bioc-issue-bot commented 8 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/psichomics_buildreport_20161003163825.html

bioc-issue-bot commented 8 years ago

Received a valid push; starting a build. Commits are:

43f7792 Import missing function from Shiny

bioc-issue-bot commented 8 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/psichomics_buildreport_20161003172357.html

lshep commented 8 years ago

2

My point of the progress bar was that I was unaware that it was there or even existed until about half way through the vignette. It was smaller and hard to see but useful to know that the application is processing. Just a reference that it is there.

6

If these aren't implemented please take out of the current version. If these are partially used, please clean it.

7

It would be useful then to implement the use of a 'tempdir()' whenever R CMD build / R CMD check are run. Ideally we would want no artifacts from the build and check process on the systems.

nuno-agostinho commented 8 years ago

My point of the progress bar was that I was unaware that it was there or even existed until about half way through the vignette. It was smaller and hard to see but useful to know that the application is processing. Just a reference that it is there.

Okay, I have included this in the vignette.

If these aren't implemented please take out of the current version. If these are partially used, please clean it.

I already cleaned it, but I still have to improve them. I am also thinking of including a vignette specifically related to them, as they allow users to extend support to other species and assemblies, but it's a fairly complicated process.

It would be useful then to implement the use of a 'tempdir()' whenever R CMD build / R CMD check are run. Ideally we would want no artifacts from the build and check process on the systems.

The vignette doesn't download any files when running the R CMD build / check. The code chunks that should download files are set not to be evaluated in the vignette. Instead, a hidden code chunk loads sample files (vignettes/ex_TCGA_clinical.RDS and vignettes/ex_TCGA_junction_quant_chr22.RDS) that come with the package. This ensures that, even if the API is offline, the vignette does not fail building / checking and that the vignette runs faster (because the sample files are just an extract of the downloadable files).

By the way, @lshep, did you successfully run the CLI version?

lshep commented 8 years ago

Yes I was able to successfully run the CLI version.

lshep commented 8 years ago

Could you please fix or respond to the WARNING in the build report

nuno-agostinho commented 8 years ago

Could you please fix or respond to the WARNING in the build report

I fixed one of the warnings (the one that asked to update R version dependency).

However, I don't know how to fix the second warning: Add non-empty \value sections to the following man pages. The functions listed always return NULL because it is what they do rather then their output that matters. For instance, most of those functions modify the state of the Shiny session. As such, I thought that it would be best to either ignore this warning or fill the returned value section with "NULL (this function is used to modify the Shiny session's state)". What do you think?

lshep commented 8 years ago

We think it best that a value section with the suggested "NULL (this function is used to modify the Shiny session's state)" would be better.

bioc-issue-bot commented 8 years ago

Received a valid push; starting a build. Commits are:

23248bb Fix warnings in R CMD check * Update R version de... d9702f0 Vastly improve event parsing * Make it easier to ... 1bb40b1 Improve gene annotation * Handle Ensembl identifi... 6cc400d Remove warnings and other miscellanious fixes * M... 0659070 Misc improvements * Remove errors and warnings in...

nuno-agostinho commented 8 years ago

@lshep, sorry for the delay, but I had to clean, simplify and document the functions which had the hardcoded file paths. This process included adding a new vignette, more tests and sample data. I have also fixed the warnings as per your suggestions and made other small improvements.

bioc-issue-bot commented 8 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/psichomics_buildreport_20161008192915.html

bioc-issue-bot commented 8 years ago

Received a valid push; starting a build. Commits are:

921c87b Load alternative splicing quantification from file...

bioc-issue-bot commented 8 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/psichomics_buildreport_20161008210913.html

nuno-agostinho commented 8 years ago

Hey @lshep!

As we are almost reaching the Bioconductor deadline, I would like to know if you have any additional comments on my package. The latest version of the package still has an 8MB file, but this file is going to be removed as soon as @vobencha uploads my annotation package to Bioconductor. Besides that, do you think that this package can be accepted in Bioconductor or is there something missing?

Thank you for your help!

lshep commented 8 years ago

Hi @nuno-agostinho Please add the Bioconductor installation instructions to all your vignettes (GUI and command line vignettes).
Other than that I think it is looking good. I cannot accept the package until after the annotation file is removed and the package is tested with accessing the data from AnnotationHub. I just spoke with @vobencha , she is working on this, and this should be completed today. She will notify you when it is ready and when you can make the appropriate changes and bump the version number to get a clean build.

Cheers

vobencha commented 8 years ago

Hi @nuno-agostinho and @lshep , Data are now ready in AnnotationHub. You'll need snapshot October 11th:

> ah <- AnnotationHub()
updating metadata: retrieving 1 resource
  |======================================================================| 100%
snapshotDate(): 2016-10-11

You can find the resource by searching on package name or title:

> query(ah, "alternativeSplicingEvents.hg19")
AnnotationHub with 1 record
# snapshotDate(): 2016-10-11 
# names(): AH51461
# $dataprovider: MISO, VAST-TOOLS, UCSC
...

Single bracket shows the metadata:

> ah["AH51461"]
AnnotationHub with 1 record
# snapshotDate(): 2016-10-11 
# names(): AH51461
# $dataprovider: MISO, VAST-TOOLS, UCSC
...

Double bracket downloads the resource:

> xx <- ah[["AH51461"]]
downloading from ‘https://annotationhub.bioconductor.org/fetch/58199’
retrieving 1 resource
  |======================================================================| 100%
> class(xx)
[1] "list"
> length(xx)
[1] 8

Please make the following changes to PSIchomics:

Valerie

bioc-issue-bot commented 8 years ago

Received a valid push; starting a build. Commits are:

d17a298 Fix characters displaying their Unicode code, clos... af3736d Improve loading of alternative splicing quantifica... 9538891 Improve PCA plots, closes #208 * Remove horizonta... 05af818 Fix PCA unit tests c6aab1b Add t-test and Fligner-Killeen test, closes #206 ... 6307423 Multiple minor improvements * Add X and Y crossha... 9071b7a Multiple minor improvements * Add X and Y crossha... 605e573 Load annotation from AnnotationHub and improvement... 177e5ac Merge branch 'dev' of github.com:nuno-agostinho/ps... 12d1ca4 Fix text position and loading message 065b47d Update visual interface tutorial * Update images ...

nuno-agostinho commented 8 years ago

Hello @vobencha and @lshep!

Thanks for uploading the annotation package. I removed the file in the package and now use the annotation package's object. I also updated the documentation accordingly, including the instructions to install the package from Bioconductor.

bioc-issue-bot commented 8 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/psichomics_buildreport_20161012001710.html

lshep commented 8 years ago

Thank you for making these changes. The package looks okay now and I will recommend be accepted. You should get svn instructions via email in the next couple of days

nuno-agostinho commented 8 years ago

All right. Thanks for your help, @lshep ! :)