Bioconductor / Contributions

Contribute Packages to Bioconductor
134 stars 33 forks source link

(inactive) DIscBIO #1376

Closed wleoncio closed 4 years ago

wleoncio commented 4 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 4 years ago

Hi @wleoncio

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: DIscBIO
Date: 2020-02-06
Title: A User-Friendly Pipeline for Biomarker Discovery in Single-Cell Transcriptomics
Version: 0.99.0
Authors@R: 
    c(
        person(given = "Salim",
     family = "Ghannoum",
     role = c("aut", "cph"),
     email = "salim.ghannoum@medisin.uio.no"),
        person(given = "Alvaro",
     family = "Köhn-Luque",
     role = c("aut", "ths"),
     email = "alvaro.kohn-luque@medisin.uio.no"),
        person(given = "Waldir",
     family = "Leoncio",
     role = c("cre", "aut"),
     email = "w.l.netto@medisin.uio.no")
    )
Description: An open, multi-algorithmic pipeline for easy, fast and efficient
    analysis of cellular sub-populations and the molecular signatures that 
    characterize them. The pipeline consists of four successive steps: data 
    pre-processing, cellular clustering with pseudo-temporal ordering, defining 
    differential expressed genes and biomarker identification. This package 
    implements extensions of the work published by Ghannoum et. al. 
    <doi:10.1101/700989>.
biocViews: SingleCell, Transcriptomics, Clustering, DecisionTree
License: MIT + file LICENSE
Encoding: UTF-8
Imports: methods, TSCAN, boot, httr, mclust, statmod, biomaRt, samr, igraph,
    RWeka, partykit, grid, philentropy, NetIndices, png, matrixStats, grDevices,
    readr, RColorBrewer, ggplot2, rpart, fpc, cluster, rpart.plot, amap, dplyr,
    tsne, AnnotationDbi, org.Hs.eg.db, calibrate, graphics, stats, utils
Depends:
    R (>= 3.6),
Suggests: 
    knitr,
    rmarkdown,
    testthat,
    enrichR
VignetteBuilder: knitr
LazyData: true
RoxygenNote: 7.0.2
BugReports: https://github.com/ocbe-uio/DIscBIO/issues
Collate: 
    'DIscBIO-classes.R'
    'DIscBIO-generic-ClassVectoringDT.R'
    'DIscBIO-generic-Clustexp.R'
    'DIscBIO-generic-DEGanalysis.R'
    'DIscBIO-generic-DEGanalysis2clust.R'
    'DIscBIO-generic-Exprmclust.R'
    'DIscBIO-generic-FinalPreprocessing.R'
    'DIscBIO-generic-FindOutliersKM.R'
    'DIscBIO-generic-FindOutliersMB.R'
    'DIscBIO-generic-KMClustDiffGenes.R'
    'DIscBIO-generic-KMclustheatmap.R'
    'DIscBIO-generic-KmeanOrder.R'
    'DIscBIO-generic-MBClustDiffGenes.R'
    'DIscBIO-generic-MBclustheatmap.R'
    'DIscBIO-generic-NoiseFiltering.R'
    'DIscBIO-generic-Normalizedata.R'
    'DIscBIO-generic-PCAplotSymbols.R'
    'DIscBIO-generic-PlotmclustMB.R'
    'DIscBIO-generic-comptSNE.R'
    'DIscBIO-generic-comptsneMB.R'
    'DIscBIO-generic-plotExptSNE.R'
    'DIscBIO-generic-plotGap.R'
    'DIscBIO-generic-plotKmeansLabelstSNE.R'
    'DIscBIO-generic-plotMBLabelstSNE.R'
    'DIscBIO-generic-plotOrderKMtsne.R'
    'DIscBIO-generic-plotOrderMBtsne.R'
    'DIscBIO-generic-plotSilhouette.R'
    'DIscBIO-generic-plotSymbolstSNE.R'
    'DIscBIO-generic-plotexptsneMB.R'
    'DIscBIO-generic-plotsilhouetteMB.R'
    'DIscBIO-generic-plottSNE.R'
    'DIscBIO-generic-plottsneMB.R'
    'J48DT.R'
    'J48DTeval.R'
    'Jaccard.R'
    'MB_Order.R'
    'NetAnalysis.R'
    'Networking.R'
    'PPI.R'
    'PlotMBexpPCA.R'
    'PlotMBorderPCA.R'
    'RpartDT.R'
    'RpartEVAL.R'
    'VolcanoPlot.R'
    'datasets.R'
    'retrieveBiomart.R'

Add SSH keys to your GitHub account. SSH keys will are used to control access to accepted Bioconductor packages. See these instructions to add SSH keys to your GitHub account.

bioc-issue-bot commented 4 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.

mtmorgan commented 4 years ago

This package should almost certainly implement an interface to SingleCellExperiment and use other standard Bioconductor representations for robustness and interoperability; see for instance this recent comment.

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

wleoncio commented 4 years ago

On one or more platforms, the build results were: "skipped, ERROR".

Is it possible to trigger a new build? The error seems to be caused by a failed attempt to access the string-db.org API. As further example that the package should be building fine, here is the TravisCI output from a build three days ago.

bilde

lshep commented 4 years ago

A new build is triggered with a version bump in the DESCRIPTION if your webhook is set up. And please see @mtmorgan comment about changing the interfaces utiltized

wleoncio commented 4 years ago

A new build is triggered with a version bump in the DESCRIPTION if your webhook is set up.

Got it, I'll take care of that.

And please see @mtmorgan comment about changing the interfaces utiltized

I defer a reply to the package's principal contributor, @systemsbiologist, who can better explain the choice to use a new class instead of the ones available at SingleCellExperiment.

SystemsBiologist commented 4 years ago

Dear @lshep and @mtmorgan,

Thank you very much for your feedback. We appreciate the points you raised about interoperability, and expectation that all Bioconductor packages should work together. We took on your suggestion, and started developing an interface that allows SingleCellExperiment-class objects to be analyzed using our package. We will consider using other standard Bioconductor representations wherever possible/appropriate. We are planning to resubmit an updated version of our package within 1-2 weeks. Thanks .

hpages commented 4 years ago

Thanks @SystemsBiologist and @wleoncio for the update.

H.

hpages commented 4 years ago

Hi @wleoncio and @SystemsBiologist ,

Do you have any update on this?

Thanks, H.

wleoncio commented 4 years ago

Hi @hpages, thanks for checking in! We have finished implementing all changes requested and are currently adding some unit tests and making sure everything is working as expected. We might go a few days over the previous 1-2 weeks prediction, but we're nearly ready to push a new patch to master.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

7101b64 Merge branch 'release-1.0' into dev 62411ac fix quotation marks in README af3c328 add files for binder 8bbeb77 update binder install command 151fef6 rWeka needs jdk 3f4d564 missing tsne, use isntallable name for pheatmap c26566a install DIscBIO using the same command as R CMD IN... c20ea39 install DIscBIO in postBuild a8ab1f9 Added file to trigger TravisCI build c289d9a Merge branch 'addTravis' into dev afa65cc Merge branch 'dev' into binder a34c49f Added TravisCI badge to README file 2600751 add binder files to Rbuildignore d84d6e4 install BiocInstaller needed for installation fro... 912be0c Merge pull request #4 from minrk/binder [WIP] Bin... e960a11 Added Binder badge 4db86e2 Fixed Binder badge link f1ede0f Added recommendation to use Bioc coding style. e6302fe Added issue templates 8253c2f Merge pull request #5 from ocbe-uio/issue-template... 35149fe Added dataset with 18556 genes and 8000 cells 6b61176 Added "pan_indrop_matrix_8000cells_18556genes.rda" 95ef947 Adding the dataset "pan_indrop_matrix_8000cells_18... 2e49ba6 Update DESCRIPTION fe361fa Update DIscBIO-classes.R updating initialize 6f124a9 Update datasets.R 0f830d7 Create customConverters.R interfaces and conversi... f90f901 Add files via upload a1b5b3f Delete genes20k.rda 50a0705 Add files via upload 747282e Merge branch 'dev' into dfmod 0220a74 Updated documentation 7332f7b Merge pull request #6 from dami82/dfmod Dfmod 275fda6 Fixed variable scope note 0557b6d Version bump for next submission 5c91a07 Whitespace removal 7654e05 Fixed bug when converting from SCE to DISCBIO bc21b83 Added unit tests for data converting da2d729 Added Damiano as package contributor. f1e4c1f Fixed check-crashing bug in docs 83d49cf Changed R version dependence (BiocCheck required) dff8451 Merge pull request #7 from ocbe-uio/release-1.0 B...

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

wleoncio commented 4 years ago

The warnings are quite minor ("update R from 3.6 to 4.0", "add blank line to end of the file") and I am ready to push the fixes to master. But I am a bit stumped about the Windows error:

 ** byte-compile and prepare package for lazy loading
 Error: .onLoad failed in loadNamespace() for 'rJava', details:
  call: library.dynam("rJava", pkgname, libname)
  error: DLL 'rJava' not found: maybe not installed for this architecture?

Should I worry about this or is this likely to be a fluke?

lshep commented 4 years ago

I reinstalled rJava yesterday afternoon on the builders so you should hopefully not have this issue anymore

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

d330b0b Merge branch 'release-1.0' into dev b344e08 Merge commit 'f1e4c1f1a03b4b93b90d51a9525ed1cf681b... b262e65 Added Travis caching for quicker builds 86c1d03 Solved minor warnings on Linux Bioc check 892b9b4 Version bump a14ae86 Merge branch 'release-1.0' into dev 52f7010 Removed bioc-release (package now requires R 4.0) 097867a Changed Travis cache parameters f401c0f Merge branch 'master' into release-1.0 4854560 Merge pull request #8 from ocbe-uio/release-1.0 B...

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "TIMEOUT, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

eaab4ab Fixed vignette encoding bug on Windows c6a7995 Version bump 72e0c33 Wrapped examples with dontrun to save build time 5f72572 Merge branch 'master' into release-1.0 5ac99a6 Merge pull request #9 from ocbe-uio/release-1.0 B...

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

7f1f5cd Version bump 68782cd Added more runnable examples 8de0522 Merge pull request #10 from ocbe-uio/release-1.0 ...

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

wleoncio commented 4 years ago

Finally! Now I can ~go out~ stay indoors for Easter. Happy holidays to whoever reads this! :)

SystemsBiologist commented 4 years ago

Happy Easter!

hpages commented 4 years ago

Hi @SystemsBiologist, @wleoncio,

How is the refactoring of the package going? As of today, I see that the pipeline implemented in the package is still entirely based on DIscBIO objects, which are incompatible with SingleCellExperiment objects. FWIW Martin Morgan's suggestion (@mtmorgan) was that you use SingleCellExperiment objects instead for interoperability with all other single-cell analysis packages in Bioconductor.

Please don't hesitate to ask on the bioc-devel mailing list if you have questions or concerns about this. Thanks!

H.

wleoncio commented 4 years ago

Dear @hpages,

Interoperability with the SCE objects is done via the as.DISCBIO function, whose documentation contains the following example:

g1_sce <- SingleCellExperiment::SingleCellExperiment(
    list(counts=as.matrix(valuesG1msReduced)) # internal dataset
)
g1_disc <- as.DISCBIO(g1_sce)
class(g1_disc) #returns "DISCBIO"

That function basically imports an SCE object and transforms the slots accordingly for usage on DIscBIO. Is that a reasonable solution? I'm afraid refactoring all functions by, say, adding methods for SCE objects on each of them would be quite the undertaking, which could set us back a few months.

hpages commented 4 years ago

In practice this approach won't provide much interoperability with other single-cell analysis packages in Bioconductor, for a number of reasons. For example as.DISCBIO() is largely untested and will fail for most SingleCellExperiment objects (except for the trivial one you use in your example). Also at any point in your pipeline the user should be able to turn their DISCBIO object back into a SingleCellExperiment object. Otherwise, they'll be stuck in the DIscBIO pipeline with their DISCBIO object. Interoperability with other tools means that at any point in their analysis, the user can replace a tool with another if they want.

The deeper issue is that the DISCBIO container is re-inventing the wheel. On the other hand Bioconductor already has a well-tested/well-supported container for representing single-cell data. So, yes, our suggestion is that there is no need for the DISCBIO container and that you base your pipeline on SingleCellExperiment objects.

Thanks, H.

wleoncio commented 4 years ago

Thank you for the clarification, @hpages. I'll discuss the issue with @SystemsBiologist, we'll act on it ASAP.

SystemsBiologist commented 4 years ago

Dear @hpages,

Thanks for your reply. We understand you are not happy with the updated package. Please, rest assured that we are not trying to re-invent the wheel, and we agree that the singleCellExperiment-class should be the reference class in any single cell analysis pipeline. We understand your points but it seems to us you changed your mind about an interface being sufficient in our case.

We are really willing to put an extra effort and revise again our package to make it suitable for the bioconductor ecosystem. However, we need to make sure that our work will be aligned to the requirements for acceptance in bioconductor. This is our plan. We can definitely use a singeCellExperiment object as the only input of our pipeline, yet we need some custom/extra slots for storing intermediates and results of our analysis. We can rewrite the DISCBIO class definition so that the DISCBIO-class can> 1) extend the singleCellExperiment-class, or
2) just include a slot for a singleCellExperiment object. Would either strategy be compatible with bioconductor requirements? Please, let us know.

Best regards

hpages commented 4 years ago

but it seems to us you changed your mind about an interface being sufficient in our case

There is no change of mind. Maybe we were not clear enough initially about what we expected? Sorry if that was the case. Just to clarify, rest assured that Bioconductor commitment to standard representations and interoperability has always been very strong.

We sometimes refer to the 2 approaches you describe as using inheritance vs composition. Most of the times developers try to go for the 1st, and only if they run into problems, they switch to the 2nd. The problem one typically faces with the inheritance approach is that they have to choose which SummarizedExperiment subclass to derive from. For example in your case the SingleCellExperiment class. If in the future you decide that you want to support other SingleCellExperiment derivatives, then you'll have to introduce a new subclass for each of them. This would quickly become cumbersome and hard to maintain. Using composition avoids that problem e.g. if you include a slot that expects a SingleCellExperiment object then it would accept a SingleCellExperiment object or derivative. However, the big drawback of the composition approach is that your objects don't inherit the SingleCellExperiment API. But since it's easy to get the SingleCellExperiment object out of the bigger object, this should not be too bad in practice.

A 3rd approach is to not introduce a new class and to just use a plain SingleCellExperiment object. And then to store the intermediate results that don't fit in the assays or rowData or colData components of the object in its metadata component.

Hope this helps, H.

SystemsBiologist commented 4 years ago

Dear @hpages,

Thanks for your clarification. We are wondering if we go with the second option (rewriting the DISCBIO class definition to include a slot for a singleCellExperiment object) would that be compatible with Bioconductor requirements? Or we should go with the third option (using a plain SingleCellExperiment object. and storing the intermediate results in the metadata component)?

Looking forward to hearing from you

hpages commented 4 years ago

Using a plain SingleCellExperiment object, if possible, is the preferred way. If your intermediate results have the rectangular shape of the SingleCellExperiment object they could go in an assay. If they are parallel to its 1st dimension, they should go in the rowData. If they are parallel to its 2nd dimension, they should go in the colData. If they have a rectangular shape with the same number of columns as the object but not necessarily the same number of rows, they could go in the altExps list. If they don't fit in any of these components, they can always go in the metadata list. But this should typically be the last resort.

Several pipelines in Bioconductor start with a SingleCellExperiment object (or other SummarizedExperiment derivative) that contains only 1 assay with the raw counts, and then stuff the object with intermediate results as we move thru the pipeline. I suggest that you take a look at what other packages do to get a feeling of whether or not that would work for you. If for whatever reason it doesn't, then I guess it would be ok to go for the second option (composition), even though that is not how other pipelines typically handle this.

I'm by no mean a SingleCellExperiment pipeline expert so please don't hesitate to seek advice on the bioc-devel mailing list or the Bioconductor Community Slack (e.g. #SingleCellExperiment channel, see Community Slack sign-up on our website) where the real experts should be able to give valuable advice.

Best, H.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

9925951 Make sure to import SingleCellExperiment class cb7a2ce Update DISCBIO class to include a SingleCellExperi... 51a5ff8 Update DISCBIO class to include a SingleCellExperi... 7373964 fix SingleCellExperiment declaration b55be0c fix SingleCellExperiment declaration - missing MAN... 2a0f7d7 Update customConverters.R ff80e36 Update DESCRIPTION c265560 Update NAMESPACE add DISCBIO2SingleCellExperiment... 8124855 Update DESCRIPTION 61383f8 Update DESCRIPTION ded80c6 Update DESCRIPTION c158096 Update customConverters.R 9021564 Update DESCRIPTION f9da253 Update DIscBIO-classes.R 18ff473 Update DIscBIO-classes.R 56531c5 Rolled back version for continuity on master 7eab5a9 Merge pull request #12 from dami82/DFmod D fmod

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

wleoncio commented 4 years ago

The error seems like a connection issue ("GnuTLS recv error (-110): The TLS connection was non-properly terminated."). Is it possible to trigger another build without committing on our master branch?

lshep commented 4 years ago

Normally only commits to the master branch will trigger builds. We do have a way to do this manually. I will trigger a build manually this once but please don't rely on this often. (your report should post shortly)

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

wleoncio commented 4 years ago

@lshep, thank you for making an exception. Too bad it triggered the same error. We're going to take another look in our end, we can try to add some code to circumvent this issue on the package side.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

e6f4378 Rolled back R required to 3.6 to allow development 3165930 Changed caching configuration a8f4f5c Merge branch 'dev' into release-1.0 3dff135 Updated R dependency 069b993 Merge branch 'master' into release-1.0 f7c8f90 Updated documentation 2950152 Updated documentation 5270157 Merge branch 'release-1.0' of https://github.com/o... fbb77c3 Removed whitespace b574ce4 Added examples to conversion functions a1287dd Added PR template fb6dc80 Merge pull request #13 from ocbe-uio/release-1.0 ...

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "TIMEOUT, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

088722d Merge commit '7eab5a9d0b7af19b19e70ef27a581f401ab6... a0985e4 Merge commit 'a1287dd3878175b19022a32504d6a35c01cf... 2e7371a Improved bug report template 2fcab4b Merge commit '2e7371a1f1811ef027fda2ed35ecb910ed6e... b43ec2f Version bump 9296022 Suppressed evaluation of vignette MB chunks 4bbe9a8 Fixed text and package date e363e6a Fixed exporting of CSV file in examples 54737cf Added tsne controls to comptsneMB 7b168a7 Optimized examples 4760f5b Added tsne controls to comptSNE bb12ec0 Simplified examples 0c341b4 Simplified vignette end test units 6a4e257 Reformatted code to Bioconductor standards 9f4ea72 Merge branch 'master' into release-1.0 0eb0ac5 Changes to Travis config 753d94e Merge pull request #15 from ocbe-uio/release-1.0 ...

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

84074a2 Refactoring of integration tests and Jaccard() b1aa37f Added more customization options to Jaccard() 94f3308 Version bump b13fe35 Optimized test units (saving 3 s) 66f7be0 Adjusting code to Bioconductor style b1c85f3 Merge branch 'master' into release-1.0 196d59f Merge pull request #16 from ocbe-uio/release-1.0 ...

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

SystemsBiologist commented 4 years ago

Dear @hpages,

We submitted an updated version of our DISCBIO R package. After your feedback, we assessed the feasibility of using the singleCellExperiment-class as the sole class to store all data and intermediates from our pipeline. However, we concluded that this was not the best solution here. Our pipeline is mainly designed to process single cell experiments that include spike-in samples. These samples are used for normalization and internal control, and need to be isolated from the rest of the data at the very beginning of our pipeline. Since this operation generates matrices with sizes that do not match (nor match with the input expression counts matrix), it would be inefficient to store the DISCBIO intermediate data in a single singleCellExperiment object (it is feasible, but the resulting structure would be complex and counter-intuitive, and may lead to significant data duplication depending on the number of spike-ins).

On the contrary, we considered that a more efficient approach was to rely on composition (that you also recommended as an alternative scenario). The DISCBIO pipeline now starts from a singleCellExperiment object that provides the input data. Specifically, a singleCellExperiment object is stored as one of the slots of our DISCBIO class. Expression counts are retrieved from the singleCellExperiment using singleCellExperiment-specific methods, and then spike-in samples are extracted (during the second step) to generate the reduced-size matrices that are used in the following steps of our pipeline.

Since the updated version of DISCBIO relies on composition, importing a singleCellExperiment into a DISCBIO object (or exporting from) is trivial, thus facilitating the processing of the same data via alternative Bioconductor pipelines. We think that this solution satisfies Bioconductor’s guidelines aimed at enhancing code reuse and interoperability.

We thank you in advance for your time and for considering DISCBIO for inclusion in Bioconductor.

Looking forward to hearing from you

hpages commented 4 years ago

Hi,

Have you considered storing the spike-in data in the altExps component of the SingleCellExperiment object? This is the typical use case that motivated the introduction of the altExps feature by the SingleCellExperiment developers. As mentioned earlier, the experts on the #SingleCellExperiment channel of the community-bioc Slack will happily provide advice on these matters.

Thanks

wleoncio commented 4 years ago

Dear @hpages,

We have discussed the suggested implementation; unless we misunderstood the final goal of the review board, we are afraid this would trigger an overhaul of the package to accomodate a SingleCellExperiment object into the whole DIscBIO pipeline. Our package was envisioned as a standalone pipeline and we are definitely willing to integrate the initial input and final output with other classes from the Bioconductor universe, but would rather keep the intermediate results (including spike-in calculations) within the DISCBIO-class object. I hope this is not an impediment to the acceptance of the package, but if it is we totally understand and are prepared to retract the submission.

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

SystemsBiologist commented 4 years ago

Dear @hpages,

We hope you are doing great! We are just wondering if you have had the chance to take a decision regarding our package, which we really hope it gets accepted and be part of Bioconductor.

Looking forward to hearing from you

hpages commented 4 years ago

Hi,

So I took a look at your DISCBIO object (21 slots!) and here is what I see:

Initial DISCBIO object

We enter the pipeline by feeding the DISCBIO() constructor with an M x N data frame containing single cell data with N samples and M features (genes + ERCC spike-ins).

This populates slots SingleCellExperiment (SingleCellExperiment object, a.k.a. SCE) and expdataAll (data frame), each with a full copy of the input data, It also populates slots expdata, ndata, and fdata (all of them data frames), each with a copy of the input data from which the ERCC spike-ins have been removed.

The altExp feature of the SCE should be used to store the spike-in data separately from the main data. Then the other 5 slots become redundant and should be removed.

NoiseFiltering step

This steps populates slot noiseF with the names of the genes that passed the filtering.

This should be stored in the SCE object as a column (typically logical) of its rowData() component. Then the noiseF slot becomes redundant and should be removed.

Normalizedata step

This steps populates slots ndata and fdata with the normalized data (the 2 slots are populated with identical data), and slot filterpar with some parameters (which, according to the man page, are filtering parameters but the slot only got populated now and not by the NoiseFiltering step).

The normalized data should be stored in the SCE object as a new assay, and the list of parameters should go in its metadata() component. Then slots ndata, fdata, and filterpar become redundant and should be removed.

FinalPreprocessing step

This is conceptually a subsetting operation where only a subset of the initial genes is kept. The way this is implemented in the DISCO object is by populating slot FinalGeneList with a copy of the gene list that is already in noiseF, and by populating slot fdata with ndata[FinalGeneList, ].

With the SCE object, this is just:

SCE <- SCE[rowData(SCE)$noiseF, ]

granted that rowData(SCE)$noiseF is the logical column containing the result of the NoiseFiltering step.

Clustexp step

This steps populates slots distances, cpart, clusterpar, kmeans, and fcol. distances is a square matrix representing the pairwise distances between the samples. Storing an object that is going to grow quadratically with the number of samples will certainly cause problems with most real world single cell datasets where it's not uncommon to see tens of thousands of samples. Not a problem of course in the examples provided in the vignette and man pages where the number of samples is 30.

Anyway this matrix should be stored in the SCE object as a column of its colData() component. Same thing for the cpart slot which is an integer vector of length N. The lists in clusterpar and kmeans should go in its metadata() component. Same thing for the cluster color scheme in the fcol slot.

Then these 5 slots are not longer needed and should be removed.

comptSNE step

This steps populates the tsne slot with an N x 2 data frame.

No need for this slot either. This should go in the SCE object as a column of its colData() component.

etc...

The bottom line is that the DISCBIO container is unfortunately a clumsy attempt at reinventing a container that already exists in Bioconductor. I can understand that modifying the DIscBIO pipeline to accommodate a SingleCellExperiment object represents a major overhaul but, as mentioned previously, contributions to the project are required to adopt standard representations and provide interoperability with the other tools in the ecosystem.

Let us know what you want to do.

Regards, H.

bioc-issue-bot commented 4 years ago

This issue is being closed because there has been no progress for an extended period of time. You may reopen the issue when you have the time to actively participate in the review / submission process. Please also keep in mind that a package accepted to Bioconductor requires a commitment on your part to ongoing maintenance.

Thank you for interest in Bioconductor.