Bioconductor / Contributions

Contribute Packages to Bioconductor
134 stars 33 forks source link

BioNERO #1941

Closed almeidasilvaf closed 3 years ago

almeidasilvaf commented 3 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 3 years ago

Hi @almeidasilvaf

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: BioNERO
Type: Package
Title: Biological Network Reconstruction Omnibus
Version: 0.99.0
Authors@R: c(person("Fabricio", "Almeida-Silva", 
          email="fabricio_almeidasilva@hotmail.com", 
          role=c("cre", "aut"), 
          comment = c(ORCID = "0000-0002-5314-2964")),
  person("Thiago", "Venancio", 
  role="aut", 
  comment = c(ORCID = "0000-0002-2215-8082")))
Description: BioNERO aims to integrate all aspects of biological network inference
        in a single package, including data preprocessing, exploratory analyses,
        network inference, and analyses for biological interpretations. BioNERO
        can be used to infer gene coexpression networks (GCNs) and gene 
        regulatory networks (GRNs) from gene expression data. Additionally,
        it can be used to explore topological properties of protein-protein 
        interaction (PPI) networks. GCN inference relies on the popular WGCNA 
        algorithm. GRN inference is based on the "wisdom of the crowds" principle,
        which consists in inferring GRNs with multiple algorithms (here, CLR,
        GENIE3 and ARACNE) and calculating the average rank for each interaction
        pair. As all steps of network analyses are included in this package,
        BioNERO makes users avoid having to learn the syntaxes of several 
        packages and how to communicate between them. Finally, users can also 
        identify consensus modules across independent expression sets and
        calculate intra and interspecies module preservation statistics 
        between different networks.
Depends: R (>= 4.1)
License: GPL-3
Encoding: UTF-8
LazyData: false
URL: https://github.com/almeidasilvaf/BioNERO
BugReports: https://github.com/almeidasilvaf/BioNERO/issues
biocViews: 
    GeneExpression, 
    Transcription, 
    GeneRegulation,
    GeneSetEnrichment,
    NetworkEnrichment,
    NetworkInference,
    SystemsBiology,
    Transcriptomics,
    ComparativeGenomics,
    GraphAndNetwork,
    Clustering,
    Sequencing,
    Microarray,
    Preprocessing,
    Normalization,
    Visualization,
    Network
Imports: 
    WGCNA,
    dynamicTreeCut,
    matrixStats,
    DESeq2,
    sva,
    RColorBrewer,
    ComplexHeatmap,
    ggplot2,
    reshape2,
    igraph,
    ggnetwork,
    intergraph,
    networkD3,
    ggnewscale,
    ggpubr,
    NetRep,
    stats,
    grDevices,
    graphics,
    utils,
    methods,
    BiocParallel,
    minet, 
    GENIE3,
    SummarizedExperiment
RoxygenNote: 7.1.1
Suggests: 
    knitr,
    rmarkdown,
    testthat (>= 3.0.0),
    BiocStyle
VignetteBuilder: knitr
Config/testthat/edition: 3
bioc-issue-bot commented 3 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/BioNERO to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 616bf02d7fca0e48a5b23ab21325995b6c27e05a

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, TIMEOUT, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/BioNERO to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 7f94f723191c7634a1a6159654c1e71d9c55eba2

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/BioNERO to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

LiNk-NY commented 3 years ago

Hi Fabricio, @almeidasilvaf Sorry for the delay in reviewing your package. We had to transfer the review due to unforeseen circumstances. I will get to your review as soon as possible. Thank you for your patience. Best, Marcel

LiNk-NY commented 3 years ago

Hi Fabricio, @almeidasilvaf

Please see the review below. Remember to respond the the review line by line.

Best regards, Marcel


BioNERO #1941

DESCRIPTION

NAMESPACE

Minor:

vignettes

R

Minor:

tests

almeidasilvaf commented 3 years ago

Hi, Marcel @LiNk-NY

Thank you for the insightful suggestions. I am already working on the changes, but I would like some advice from you on what to do for the functions with the reportPDF argument.

All my plotting functions return ggplot objects. However, plots in functions with reportPDF are generated by functions implemented in the WGCNA package, and they use the base graphics system. Thus, I cannot save them to objects. I came across this issue when I was developing the package and I thought saving the plots as PDF would be the best option. Now, I thought of two solutions:

What do you suggest?

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 8d01c52af6f83bd2333a0bdd436e96dd6ed0f2a9

almeidasilvaf commented 3 years ago

Hi, Marcel @LiNk-NY

Please find below my responses to each of your points.

Hi Fabricio, @almeidasilvaf

Please see the review below. Remember to respond the the review line by line.

Best regards, Marcel

BioNERO #1941

  • For the most part, the package looks like it is in good shape. Though there is some functionality that you cover here already implemented in Bioconductor.

The goal of BioNERO is to facilitate network analysis pipelines by integrating all steps in a single package. Hence, although some functionalities are implemented in Bioconductor, our focus was to make different packages communicate in a common syntax. This way, users won't have to learn the syntaxes of many different packages for a single pipeline. Additionally, as some packages implement their own S3/S4 Methods and Classes, it is hard to intercommunicate between packages.

  • Make sure that you are following Bioconductor practices and check the output of BiocCheck.

I tried my best to follow Bioc practices. Although there are some cases of longer-than-expected functions or lines of code with more than 80 columns, I tried to systematically adjust the code to match Bioc style.

DESCRIPTION

I have included "Software" now. All biocViews terms were already in their order of appearance in the hierarchy. Should I explicitly include terms such as "AssayDomain" and "BiologicalQuestion", or only their child terms are enough?

  • A high number of Imports can lead to fragility of the package. Please try to reduce the amount of imports if possible.

Please, see the first line of my responses. I have tried my best to reduce the number of imports. However, as the package aims to integrate the whole pipeline in a single package, it is reasonable to have a higher number of imports as compared to other packages. The alternative to this would be to reimplement existing functions from other packages, but reinventing the wheel is something Bioconductor does not encourage.

NAMESPACE

Minor:

  • Use consistent case for abbreviated terms in your function names (sft vs SFT)

Modified accordingly.

vignettes

  • remove_na is not only removing NA values but replacing them with zero. Reconsider renaming this function and whether the user can carry out this operation themselves without the use of the function.

I've changed the function name as recommended. It is now named replace_na().

  • Keep the vignette character / column width to 80 for easier readability of the source file.

Modified accordingly.

  • Do not write files / plot figures to the user's directory. These should be displayed as output from the function. The user can then save them in the format of their choosing which may not necessarily be PDF. What if they want a PNG?

For all functions originally saving PDF plots to the user's working directory, I have followed your suggestion and split them into analysis functions and plotting functions. Then, users can plot the figures with separate functions and decide how (and if) they want to save the plots.

R

  • For concensus_modules, decide on the modular functionality of a function. If it is a plotting function, then produce the plot otherwise produce results but not both. Having arguments like reportPDF makes the function impure because the output depends on that argument. See point above for letting the user decide the format.

See my response above. The reportPDF arguments have all been deleted. Now, users must explicitly invoke new plotting functions to visualize and save such plots as they wish.

  • The same as above for consensus_trait_cor, and exp2gcn.

Modified the same way.

  • Avoid duplicating functionality (is_singleton and is_duplicated). Perhaps the functions can be simplified to: lengths(split(og.zma.osa[og.zma.osa$Gene %in% genes, "Gene"], genes)) == 1

I have removed is_duplicated, as it is simply the opposite of is_singleton, and simplified the function as suggested. Thank you.

  • Allow input of BPPARAM for module_enrichment, get_edge_list, and grn_filter so that the user can modify to their liking with the default being SerialParam(). The examples should be runnable.

I have included an option to input the parallel back-end, with the default being SerialParam(). All examples are runnable. Some examples contain \donttest instances for package build time issues, but they are runnable and tested with testthat.

  • To avoid repeating code, it is possible to combine grl_* into one function that allows the user to input the method which would then correspond to the function used on mi_mat with some arguments. The rest of the code looks the same across the functions.

Thank you for your suggestion. Indeed, they could be merged. I combined them into a single function named grn_infer().

  • Allow the user to save the preservation Rda file (if you must save the file at all)in the location of their choice. It is good to have a FALSE default in modPres_WGCNA.

I decided to remove this parameter, as we ourselves recommend calculating preservation statistics with the NetRep algorithm.

  • Avoid code repetition in net_stats.

Modified accordingly.

  • Why not use a functional for the method argument in detect_communities?

Modified accordingly.

  • The same as above for layout in igraph2ggnetwork.

After replacing the layout argument with a functional, I decided it would be best to remove this internal function.

  • Avoid redundant code is_SE does the same as is(x, "SummarizedExperiment")

Modified accordingly.

Minor:

  • Include verbose=FALSE arguments to functions with lots of messages (e.g., PC_correction)

Included.

  • Avoid, where possible, numeric indices and use more robust named /character indices.

Numeric indices are used when the order of variables matters more than their names, such as in situations when the input object can have different variable names, but their variable order must be exactly as specified in the function's documentation. When this was not the case, I tried to use named indices.

tests

  • Looks good.

Thank you very much for your comments!

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/BioNERO to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 1519c4713fd056da3136a30df44343645b4bda63

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/BioNERO to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: ddf9a31d56adfdfbb4a54bb958f14022ee8c93d8

almeidasilvaf commented 3 years ago

Hi @LiNk-NY

I've just pushed a new version of BioNERO with minor fixes in the function gene_significance: I removed the option to save a data frame to file (I did it for all other functions, but missed this one) and changed the default color palette. The package is now ready for your appreciation.

Thank you very much.

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/BioNERO to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

LiNk-NY commented 3 years ago

Hi Fabricio, @almeidasilvaf

  • Importing {ggplotify} from CRAN, convert the base plots to ggplot objects and return the ggplot objects as elements of the resulting list. This is the easiest solution, but it would require me to import another package. I worked hard to reduce the number of imports, and that's why I didn't do it in the first place.

This sounds to me like a hack more than a solution and may not belong in package code but I could be wrong :)

  • Separating these functions in plotting functions and analysis functions, as you suggested. This way, users could save their base plots as they wish (e.g., PDF, svg, png). However, as the analyses can take a long time to run, I would have to save the objects required for plotting in a list element (say plotting_objects), so they could be used in the plotting functions without having to rerun the analysis.

There should be a separation of analysis and visualization functions. These could even go in different packages if the functionality is big enough. The ideal solution is to create a formal class that has an associated plotting method. Otherwise you can use a list (as you mentioned) for the input of the plotting function.

Best, Marcel

almeidasilvaf commented 3 years ago

Hi Marcel @LiNk-NY

I've solved this issue the second way. See my response to your each of your comments above.

Thank you.

LiNk-NY commented 3 years ago

Hi Fabricio, @almeidasilvaf

  • For the most part, the package looks like it is in good shape. Though there is some functionality that you cover here already implemented in Bioconductor.

The goal of BioNERO is to facilitate network analysis pipelines by integrating all steps in a single package. Hence, although some functionalities are implemented in Bioconductor, our focus was to make different packages communicate in a common syntax. This way, users won't have to learn the syntaxes of many different packages for a single pipeline. Additionally, as some packages implement their own S3/S4 Methods and Classes, it is hard to intercommunicate between packages.

In a previous comment you said that you wanted to avoid additional dependencies but having 'all steps in a single package' goes against that. The point of working in the Bioconductor ecosystem is to have interoperability where no one package provides all the functionality. If you wanted to guide the user step-by-step through a pipeline, perhaps a workflow package would be more apt.

I have included "Software" now. All biocViews terms were already in their order of appearance in the hierarchy. Should I explicitly include terms such as "AssayDomain" and "BiologicalQuestion", or only their child terms are enough?

This doesn't look quite right. AFAIU, the package should land in only one bucket. For example, Software > AssayDomain > GeneExpression.

  • A high number of Imports can lead to fragility of the package. Please try to reduce the amount of imports if possible.

Please, see the first line of my responses. I have tried my best to reduce the number of imports. However, as the package aims to integrate the whole pipeline in a single package, it is reasonable to have a higher number of imports as compared to other packages. The alternative to this would be to reimplement existing functions from other packages, but reinventing the wheel is something Bioconductor does not encourage.

This is not what we encourage. See my comment about workflows above.

  • Do not write files / plot figures to the user's directory. These should be displayed as output from the function. The user can then save them in the format of their choosing which may not necessarily be PDF. What if they want a PNG?

For all functions originally saving PDF plots to the user's working directory, I have followed your suggestion and split them into analysis functions and plotting functions. Then, users can plot the figures with separate functions and decide how (and if) they want to save the plots.

Great!

  • Avoid, where possible, numeric indices and use more robust named /character indices.

Numeric indices are used when the order of variables matters more than their names, such as in situations when the input object can have different variable names, but their variable order must be exactly as specified in the function's documentation. When this was not the case, I tried to use named indices.

The robust way of doing this is to have the user specify the column names (with sensible defaults in the function definition) and order based on those values rather than indices.

Best, Marcel

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: f9e82a43cd483fb10589a34d7e8ddd076362ecbc

almeidasilvaf commented 3 years ago

Hi Marcel @LiNk-NY

The goal of BioNERO is to facilitate network analysis pipelines by integrating all steps in a single package. Hence, although some functionalities are implemented in Bioconductor, our focus was to make different packages communicate in a common syntax. This way, users won't have to learn the syntaxes of many different packages for a single pipeline. Additionally, as some packages implement their own S3/S4 Methods and Classes, it is hard to intercommunicate between packages.

In a previous comment you said that you wanted to avoid additional dependencies but having 'all steps in a single package' goes against that. The point of working in the Bioconductor ecosystem is to have interoperability where no one package provides all the functionality. If you wanted to guide the user step-by-step through a pipeline, perhaps a workflow package would be more apt.

This package does not simply offer a step-by-step through a pipeline, but it contains new solutions for existing problems in each of the steps. A workflow package wouldn't be appropriate, as several functionalities implemented here are not implemented anywhere. As the packages I have imported are very stable, I don't think the imports raise much concern.

I have included "Software" now. All biocViews terms were already in their order of appearance in the hierarchy. Should I explicitly include terms such as "AssayDomain" and "BiologicalQuestion", or only their child terms are enough?

This doesn't look quite right. AFAIU, the package should land in only one bucket. For example, Software > AssayDomain > GeneExpression.

Please, check other Bioconductor packages or let me know if this is a new Bioconductor practice. If it is, it should be documented somewhere. I have checked 5 randomly selected Bioc packages and all of them have biocViews terms from different categories. As an example, see the DESCRIPTION file for DESeq2: https://github.com/mikelove/DESeq2/blob/master/DESCRIPTION

Anyway, I removed some biocViews terms to reduce the number of terms.

  • A high number of Imports can lead to fragility of the package. Please try to reduce the amount of imports if possible.

Please, see the first line of my responses. I have tried my best to reduce the number of imports. However, as the package aims to integrate the whole pipeline in a single package, it is reasonable to have a higher number of imports as compared to other packages. The alternative to this would be to reimplement existing functions from other packages, but reinventing the wheel is something Bioconductor does not encourage.

This is not what we encourage. See my comment about workflows above.

See my response above. Besides, I ran checks in my local machine with --as-cran. CRAN checks throw NOTEs if packages import more packages than a given number (I don't remember which number it is). This was not the case for BioNERO, which means that it imports a reasonable number of packages. Further, as I mentioned above, I made sure the packages I was importing are stable and trustworthy. Thus, I don't it will be a problem.

  • Do not write files / plot figures to the user's directory. These should be displayed as output from the function. The user can then save them in the format of their choosing which may not necessarily be PDF. What if they want a PNG?

For all functions originally saving PDF plots to the user's working directory, I have followed your suggestion and split them into analysis functions and plotting functions. Then, users can plot the figures with separate functions and decide how (and if) they want to save the plots.

Great!

  • Avoid, where possible, numeric indices and use more robust named /character indices.

Numeric indices are used when the order of variables matters more than their names, such as in situations when the input object can have different variable names, but their variable order must be exactly as specified in the function's documentation. When this was not the case, I tried to use named indices.

The robust way of doing this is to have the user specify the column names (with sensible defaults in the function definition) and order based on those values rather than indices.

Honestly, I don't see how this can be better. Speaking for myself, I don't see why I would have to specify column names just for indexing when using a function from a package. It is way easier and more practical to let users name their columns as they want and just pick the column indices. For instance, if a user names his column containing sample metadata as "Tissue" and other user names it "Condition", it doesn't make any difference. If you found a special case where named indices would be better than numeric ones, please let me know. I have scanned my code looking for such examples, but I didn't find any.

Thank you for your kind comments. Best, Fabricio

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/BioNERO to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

LiNk-NY commented 3 years ago

Hi Fabricio, @almeidasilvaf

Please, check other Bioconductor packages or let me know if this is a new Bioconductor practice. If it is, it should be documented somewhere. I have checked 5 randomly selected Bioc packages and all of them have biocViews terms from different categories. As an example, see the DESCRIPTION file for DESeq2: https://github.com/mikelove/DESeq2/blob/master/DESCRIPTION

I've confirmed with Lori that you can use multiple leaf nodes as long as they're from the same Software trunk in biocViews. We will update the package guidelines to make this more clear https://www.bioconductor.org/developers/package-guidelines/#description

See my response above. Besides, I ran checks in my local machine with --as-cran. CRAN checks throw NOTEs if packages import more packages than a given number (I don't remember which number it is). This was not the case for BioNERO, which means that it imports a reasonable number of packages. Further, as I mentioned above, I made sure the packages I was importing are stable and trustworthy. Thus, I don't it will be a problem.

It is always better to lean towards the minimal end. The risk would still non-zero as opposed to zero when you avoid the import. A high number of imports would increase that risk. We have gone through several deprecated packages where maintainers had this same idea and weren't keen to updating their packages because of large changes in other packages or heavy dependencies on other deprecated packages.

Honestly, I don't see how this can be better. Speaking for myself, I don't see why I would have to specify column names just for indexing when using a function from a package. It is way easier and more practical to let users name their columns as they want and just pick the column indices. For instance, if a user names his column containing sample metadata as "Tissue" and other user names it "Condition", it doesn't make any difference. If you found a special case where named indices would be better than numeric ones, please let me know. I have scanned my code looking for such examples, but I didn't find any.

Perhaps some education is needed here. Using character indices is always more robust than numeric indices because the data column order can change. You never know what the user is going to provide as input. A function with robust code will work no matter the order of the columns. Your code currently will not if the order of the columns is changed or worse it will give the wrong answer. There is no special case here this is a general good practice. If a particular technology uses the term Tissue consistently in their output table, then you could use Tissue as your column default tissueCol = "Tissue" which would be independent of the index in the table when you do df[[tissueCol]]. Of course, this is not always convenient for the developer but robust code is not meant to be convenient though it can still be convenient for the user.

Best, Marcel

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: f01904407ae8d8e091270800dce6170fd8735a05

almeidasilvaf commented 3 years ago

Hi, Marcel @LiNk-NY

Please, check other Bioconductor packages or let me know if this is a new Bioconductor practice. If it is, it should be documented somewhere. I have checked 5 randomly selected Bioc packages and all of them have biocViews terms from different categories. As an example, see the DESCRIPTION file for DESeq2: https://github.com/mikelove/DESeq2/blob/master/DESCRIPTION

I've confirmed with Lori that you can use multiple leaf nodes as long as they're from the same Software trunk in biocViews. We will update the package guidelines to make this more clear https://www.bioconductor.org/developers/package-guidelines/#description

I am not sure if I got it right, but I have now only kept one term per leaf node. See if it is ok now. Indeed, it would be great to include this in the package guidelines.

See my response above. Besides, I ran checks in my local machine with --as-cran. CRAN checks throw NOTEs if packages import more packages than a given number (I don't remember which number it is). This was not the case for BioNERO, which means that it imports a reasonable number of packages. Further, as I mentioned above, I made sure the packages I was importing are stable and trustworthy. Thus, I don't it will be a problem.

It is always better to lean towards the minimal end. The risk would still non-zero as opposed to zero when you avoid the import. A high number of imports would increase that risk. We have gone through several deprecated packages where maintainers had this same idea and weren't keen to updating their packages because of large changes in other packages or heavy dependencies on other deprecated packages.

I totally understand your concerns, but I think we are all at some risk. Updating packages is one of the duties of developers, and I don't see why a developer would refuse to update his/her package if something breaks because of dependencies. It seems like abandoning a child. Besides, when we submit a package, we all commit to the long-term maintenance of our packages. IMO, not updating a package after saying you would is an ethical flaw. As I have committed to the maintenance of this package in the very beginning of this issue, I can guarantee that I will fix issues if there are any in the future.

Honestly, I don't see how this can be better. Speaking for myself, I don't see why I would have to specify column names just for indexing when using a function from a package. It is way easier and more practical to let users name their columns as they want and just pick the column indices. For instance, if a user names his column containing sample metadata as "Tissue" and other user names it "Condition", it doesn't make any difference. If you found a special case where named indices would be better than numeric ones, please let me know. I have scanned my code looking for such examples, but I didn't find any.

Perhaps some education is needed here. Using character indices is always more robust than numeric indices because the data column order can change. You never know what the user is going to provide as input. A function with robust code will work no matter the order of the columns. Your code currently will not if the order of the columns is changed or worse it will give the wrong answer. There is no special case here this is a general good practice. If a particular technology uses the term Tissue consistently in their output table, then you could use Tissue as your column default tissueCol = "Tissue" which would be independent of the index in the table when you do df[[tissueCol]]. Of course, this is not always convenient for the developer but robust code is not meant to be convenient though it can still be convenient for the user.

I understand how named indices are more robust because they are not dependent on order, but if you see the functions' documentations, you will notice that they all make very clear to the user how the input should look like. The user should read the functions' docs. Please, see an example from the function plot_ppi():

' @param edgelist_int Data frame containing the edge list for the PPI network.

' First column is the protein 1 and second column is the protein 2.

' All other columns are interpreted as edge attributes.

As we always specify how the data should look like, users would not change the column orders inadvertently. Nevertheless, I have now transformed some numeric indices to named indices where I think they would be more suitable. Finally, as this indexing "rule" seems to be one of your requirements, would you consider adding it to the package guidelines, so that developers know beforehand how they are expected to write their code? https://www.bioconductor.org/developers/package-guidelines/#rcode This would be very helpful, as some of us have practices that do not conform to Bioc best practices (e.g., vapply() instead of sapply(), seq_len() instead of 1:n...).

Best, Fabricio

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/BioNERO to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

LiNk-NY commented 3 years ago

Hi Fabricio, @almeidasilvaf

It is always better to lean towards the minimal end. The risk would still non-zero as opposed to zero when you avoid the import. A high number of imports would increase that risk. We have gone through several deprecated packages where maintainers had this same idea and weren't keen to updating their packages because of large changes in other packages or heavy dependencies on other deprecated packages.

I totally understand your concerns, but I think we are all at some risk. Updating packages is one of the duties of developers, and I don't see why a developer would refuse to update his/her package if something breaks because of dependencies. It seems like abandoning a child. Besides, when we submit a package, we all commit to the long-term maintenance of our packages. IMO, not updating a package after saying you would is an ethical flaw. As I have committed to the maintenance of this package in the very beginning of this issue, I can guarantee that I will fix issues if there are any in the future.

Of course, other things can get in the way of maintaining a package. Therefore, we are left with packages that become deprecated http://bioconductor.org/about/removed-packages/. Thank you for committing to update your package.

Honestly, I don't see how this can be better. Speaking for myself, I don't see why I would have to specify column names just for indexing when using a function from a package. It is way easier and more practical to let users name their columns as they want and just pick the column indices. For instance, if a user names his column containing sample metadata as "Tissue" and other user names it "Condition", it doesn't make any difference. If you found a special case where named indices would be better than numeric ones, please let me know. I have scanned my code looking for such examples, but I didn't find any.

Perhaps some education is needed here. Using character indices is always more robust than numeric indices because the data column order can change. You never know what the user is going to provide as input. A function with robust code will work no matter the order of the columns. Your code currently will not if the order of the columns is changed or worse it will give the wrong answer. There is no special case here this is a general good practice. If a particular technology uses the term Tissue consistently in their output table, then you could use Tissue as your column default tissueCol = "Tissue" which would be independent of the index in the table when you do df[[tissueCol]]. Of course, this is not always convenient for the developer but robust code is not meant to be convenient though it can still be convenient for the user.

I understand how named indices are more robust because they are not dependent on order, but if you see the functions' documentations, you will notice that they all make very clear to the user how the input should look like. The user should read the functions' docs. Please, see an example from the function plot_ppi():

' @param edgelist_int Data frame containing the edge list for the PPI network.

' First column is the protein 1 and second column is the protein 2.

' All other columns are interpreted as edge attributes.

As we always specify how the data should look like, users would not change the column orders inadvertently. Nevertheless, I have now transformed some numeric indices to named indices where I think they would be more suitable. Finally, as this indexing "rule" seems to be one of your requirements, would you consider adding it to the package guidelines, so that developers know beforehand how they are expected to write their code? https://www.bioconductor.org/developers/package-guidelines/#rcode This would be very helpful, as some of us have practices that do not conform to Bioc best practices (e.g., vapply() instead of sapply(), seq_len() instead of 1:n...).

Thanks for making those changes. Relying on users to have read the documentation is more of a challenge than implementing a robust fail safe option.

Your package has been accepted. Thank you for your submission.

Best regards, Marcel

bioc-issue-bot commented 3 years ago

Your package has been accepted. It will be added to the Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

vjcitn commented 3 years ago

The master branch of your GitHub repository has been added to Bioconductor's git repository.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/almeidasilvaf.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("BioNERO"). The package 'landing page' will be created at

https://bioconductor.org/packages/BioNERO

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.