Closed almeidasilvaf closed 3 years ago
Hi @almeidasilvaf
Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.
The DESCRIPTION file for this package is:
Package: BioNERO
Type: Package
Title: Biological Network Reconstruction Omnibus
Version: 0.99.0
Authors@R: c(person("Fabricio", "Almeida-Silva",
email="fabricio_almeidasilva@hotmail.com",
role=c("cre", "aut"),
comment = c(ORCID = "0000-0002-5314-2964")),
person("Thiago", "Venancio",
role="aut",
comment = c(ORCID = "0000-0002-2215-8082")))
Description: BioNERO aims to integrate all aspects of biological network inference
in a single package, including data preprocessing, exploratory analyses,
network inference, and analyses for biological interpretations. BioNERO
can be used to infer gene coexpression networks (GCNs) and gene
regulatory networks (GRNs) from gene expression data. Additionally,
it can be used to explore topological properties of protein-protein
interaction (PPI) networks. GCN inference relies on the popular WGCNA
algorithm. GRN inference is based on the "wisdom of the crowds" principle,
which consists in inferring GRNs with multiple algorithms (here, CLR,
GENIE3 and ARACNE) and calculating the average rank for each interaction
pair. As all steps of network analyses are included in this package,
BioNERO makes users avoid having to learn the syntaxes of several
packages and how to communicate between them. Finally, users can also
identify consensus modules across independent expression sets and
calculate intra and interspecies module preservation statistics
between different networks.
Depends: R (>= 4.1)
License: GPL-3
Encoding: UTF-8
LazyData: false
URL: https://github.com/almeidasilvaf/BioNERO
BugReports: https://github.com/almeidasilvaf/BioNERO/issues
biocViews:
GeneExpression,
Transcription,
GeneRegulation,
GeneSetEnrichment,
NetworkEnrichment,
NetworkInference,
SystemsBiology,
Transcriptomics,
ComparativeGenomics,
GraphAndNetwork,
Clustering,
Sequencing,
Microarray,
Preprocessing,
Normalization,
Visualization,
Network
Imports:
WGCNA,
dynamicTreeCut,
matrixStats,
DESeq2,
sva,
RColorBrewer,
ComplexHeatmap,
ggplot2,
reshape2,
igraph,
ggnetwork,
intergraph,
networkD3,
ggnewscale,
ggpubr,
NetRep,
stats,
grDevices,
graphics,
utils,
methods,
BiocParallel,
minet,
GENIE3,
SummarizedExperiment
RoxygenNote: 7.1.1
Suggests:
knitr,
rmarkdown,
testthat (>= 3.0.0),
BiocStyle
VignetteBuilder: knitr
Config/testthat/edition: 3
A reviewer has been assigned to your package. Learn what to expect during the review process.
IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.
Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/BioNERO
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: 616bf02d7fca0e48a5b23ab21325995b6c27e05a
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, TIMEOUT, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/BioNERO
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: 7f94f723191c7634a1a6159654c1e71d9c55eba2
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/BioNERO
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Hi Fabricio, @almeidasilvaf Sorry for the delay in reviewing your package. We had to transfer the review due to unforeseen circumstances. I will get to your review as soon as possible. Thank you for your patience. Best, Marcel
Hi Fabricio, @almeidasilvaf
Please see the review below. Remember to respond the the review line by line.
Best regards, Marcel
BiocCheck
.biocViews
field based on the hierarchy on this
page: http://bioconductor.org/packages/release/BiocViews.html#___SoftwareImports
can lead to fragility of the package. Please try
to reduce the amount of imports if possible.Minor:
remove_na
is not only removing NA values but replacing them with zero.
Reconsider renaming this function and whether the user can carry out this
operation themselves without the use of the function.concensus_modules
, decide on the modular functionality of a function.
If it is a plotting function, then produce the plot otherwise produce results
but not both. Having arguments like reportPDF
makes the function impure
because the output depends on that argument. See point above for letting
the user decide the format.consensus_trait_cor
, and exp2gcn
.is_singleton
and is_duplicated
).
Perhaps the functions can be simplified to:
lengths(split(og.zma.osa[og.zma.osa$Gene %in% genes, "Gene"], genes)) == 1
BPPARAM
for module_enrichment
, get_edge_list
, and
grn_filter
so that the user can modify to their liking with the default
being SerialParam()
. The examples should be runnable.grl_*
into one function
that allows the user to input the method which would then correspond
to the function used on mi_mat
with some arguments. The rest of the code
looks the same across the functions.preservation
Rda file (if you must save the
file at all)in the location of their choice. It is good to have a FALSE
default in modPres_WGCNA
.net_stats
.method
argument in detect_communities
?igraph2ggnetwork
.is_SE
does the same as is(x, "SummarizedExperiment")
Minor:
verbose=FALSE
arguments to functions with lots of messages (e.g.,
PC_correction
)Hi, Marcel @LiNk-NY
Thank you for the insightful suggestions. I am already working on the changes, but I would like some advice from you on what to do for the functions with the reportPDF
argument.
All my plotting functions return ggplot objects. However, plots in functions with reportPDF
are generated by functions implemented in the WGCNA package, and they use the base graphics system. Thus, I cannot save them to objects. I came across this issue when I was developing the package and I thought saving the plots as PDF would be the best option.
Now, I thought of two solutions:
plotting_objects
), so they could be used in the plotting functions without having to rerun the analysis. What do you suggest?
Received a valid push on git.bioconductor.org; starting a build for commit id: 8d01c52af6f83bd2333a0bdd436e96dd6ed0f2a9
Hi, Marcel @LiNk-NY
Please find below my responses to each of your points.
Hi Fabricio, @almeidasilvaf
Please see the review below. Remember to respond the the review line by line.
Best regards, Marcel
BioNERO #1941
- For the most part, the package looks like it is in good shape. Though there is some functionality that you cover here already implemented in Bioconductor.
The goal of BioNERO is to facilitate network analysis pipelines by integrating all steps in a single package. Hence, although some functionalities are implemented in Bioconductor, our focus was to make different packages communicate in a common syntax. This way, users won't have to learn the syntaxes of many different packages for a single pipeline. Additionally, as some packages implement their own S3/S4 Methods and Classes, it is hard to intercommunicate between packages.
- Make sure that you are following Bioconductor practices and check the output of
BiocCheck
.
I tried my best to follow Bioc practices. Although there are some cases of longer-than-expected functions or lines of code with more than 80 columns, I tried to systematically adjust the code to match Bioc style.
DESCRIPTION
- Provide an appropriate
biocViews
field based on the hierarchy on this page: http://bioconductor.org/packages/release/BiocViews.html#___Software
I have included "Software" now. All biocViews terms were already in their order of appearance in the hierarchy. Should I explicitly include terms such as "AssayDomain" and "BiologicalQuestion", or only their child terms are enough?
- A high number of
Imports
can lead to fragility of the package. Please try to reduce the amount of imports if possible.
Please, see the first line of my responses. I have tried my best to reduce the number of imports. However, as the package aims to integrate the whole pipeline in a single package, it is reasonable to have a higher number of imports as compared to other packages. The alternative to this would be to reimplement existing functions from other packages, but reinventing the wheel is something Bioconductor does not encourage.
NAMESPACE
Minor:
- Use consistent case for abbreviated terms in your function names (sft vs SFT)
Modified accordingly.
vignettes
remove_na
is not only removing NA values but replacing them with zero. Reconsider renaming this function and whether the user can carry out this operation themselves without the use of the function.
I've changed the function name as recommended. It is now named replace_na()
.
- Keep the vignette character / column width to 80 for easier readability of the source file.
Modified accordingly.
- Do not write files / plot figures to the user's directory. These should be displayed as output from the function. The user can then save them in the format of their choosing which may not necessarily be PDF. What if they want a PNG?
For all functions originally saving PDF plots to the user's working directory, I have followed your suggestion and split them into analysis functions and plotting functions. Then, users can plot the figures with separate functions and decide how (and if) they want to save the plots.
R
- For
concensus_modules
, decide on the modular functionality of a function. If it is a plotting function, then produce the plot otherwise produce results but not both. Having arguments likereportPDF
makes the function impure because the output depends on that argument. See point above for letting the user decide the format.
See my response above. The reportPDF
arguments have all been deleted. Now, users must explicitly invoke new plotting functions to visualize and save such plots as they wish.
- The same as above for
consensus_trait_cor
, andexp2gcn
.
Modified the same way.
- Avoid duplicating functionality (
is_singleton
andis_duplicated
). Perhaps the functions can be simplified to:lengths(split(og.zma.osa[og.zma.osa$Gene %in% genes, "Gene"], genes)) == 1
I have removed is_duplicated
, as it is simply the opposite of is_singleton
, and simplified the function as suggested. Thank you.
- Allow input of
BPPARAM
formodule_enrichment
,get_edge_list
, andgrn_filter
so that the user can modify to their liking with the default beingSerialParam()
. The examples should be runnable.
I have included an option to input the parallel back-end, with the default being SerialParam()
. All examples are runnable. Some examples contain \donttest instances for package build time issues, but they are runnable and tested with testthat.
- To avoid repeating code, it is possible to combine
grl_*
into one function that allows the user to input the method which would then correspond to the function used onmi_mat
with some arguments. The rest of the code looks the same across the functions.
Thank you for your suggestion. Indeed, they could be merged. I combined them into a single function named grn_infer()
.
- Allow the user to save the
preservation
Rda file (if you must save the file at all)in the location of their choice. It is good to have aFALSE
default inmodPres_WGCNA
.
I decided to remove this parameter, as we ourselves recommend calculating preservation statistics with the NetRep algorithm.
- Avoid code repetition in
net_stats
.
Modified accordingly.
- Why not use a functional for the
method
argument indetect_communities
?
Modified accordingly.
- The same as above for layout in
igraph2ggnetwork
.
After replacing the layout argument with a functional, I decided it would be best to remove this internal function.
- Avoid redundant code
is_SE
does the same asis(x, "SummarizedExperiment")
Modified accordingly.
Minor:
- Include
verbose=FALSE
arguments to functions with lots of messages (e.g.,PC_correction
)
Included.
- Avoid, where possible, numeric indices and use more robust named /character indices.
Numeric indices are used when the order of variables matters more than their names, such as in situations when the input object can have different variable names, but their variable order must be exactly as specified in the function's documentation. When this was not the case, I tried to use named indices.
tests
- Looks good.
Thank you very much for your comments!
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/BioNERO
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: 1519c4713fd056da3136a30df44343645b4bda63
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/BioNERO
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: ddf9a31d56adfdfbb4a54bb958f14022ee8c93d8
Hi @LiNk-NY
I've just pushed a new version of BioNERO with minor fixes in the function gene_significance
: I removed the option to save a data frame to file (I did it for all other functions, but missed this one) and changed the default color palette. The package is now ready for your appreciation.
Thank you very much.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/BioNERO
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Hi Fabricio, @almeidasilvaf
- Importing {ggplotify} from CRAN, convert the base plots to ggplot objects and return the ggplot objects as elements of the resulting list. This is the easiest solution, but it would require me to import another package. I worked hard to reduce the number of imports, and that's why I didn't do it in the first place.
This sounds to me like a hack more than a solution and may not belong in package code but I could be wrong :)
- Separating these functions in plotting functions and analysis functions, as you suggested. This way, users could save their base plots as they wish (e.g., PDF, svg, png). However, as the analyses can take a long time to run, I would have to save the objects required for plotting in a list element (say
plotting_objects
), so they could be used in the plotting functions without having to rerun the analysis.
There should be a separation of analysis and visualization functions. These could even go in different packages if the functionality is big enough. The ideal solution is to create a formal class that has an associated plotting method. Otherwise you can use a list (as you mentioned) for the input of the plotting function.
Best, Marcel
Hi Marcel @LiNk-NY
I've solved this issue the second way. See my response to your each of your comments above.
Thank you.
Hi Fabricio, @almeidasilvaf
- For the most part, the package looks like it is in good shape. Though there is some functionality that you cover here already implemented in Bioconductor.
The goal of BioNERO is to facilitate network analysis pipelines by integrating all steps in a single package. Hence, although some functionalities are implemented in Bioconductor, our focus was to make different packages communicate in a common syntax. This way, users won't have to learn the syntaxes of many different packages for a single pipeline. Additionally, as some packages implement their own S3/S4 Methods and Classes, it is hard to intercommunicate between packages.
In a previous comment you said that you wanted to avoid additional dependencies but having 'all steps in a single package' goes against that. The point of working in the Bioconductor ecosystem is to have interoperability where no one package provides all the functionality. If you wanted to guide the user step-by-step through a pipeline, perhaps a workflow package would be more apt.
- Provide an appropriate
biocViews
field based on the hierarchy on this page: http://bioconductor.org/packages/release/BiocViews.html#___SoftwareI have included "Software" now. All biocViews terms were already in their order of appearance in the hierarchy. Should I explicitly include terms such as "AssayDomain" and "BiologicalQuestion", or only their child terms are enough?
This doesn't look quite right. AFAIU, the package should land in only one bucket. For example, Software > AssayDomain > GeneExpression.
- A high number of
Imports
can lead to fragility of the package. Please try to reduce the amount of imports if possible.Please, see the first line of my responses. I have tried my best to reduce the number of imports. However, as the package aims to integrate the whole pipeline in a single package, it is reasonable to have a higher number of imports as compared to other packages. The alternative to this would be to reimplement existing functions from other packages, but reinventing the wheel is something Bioconductor does not encourage.
This is not what we encourage. See my comment about workflows above.
- Do not write files / plot figures to the user's directory. These should be displayed as output from the function. The user can then save them in the format of their choosing which may not necessarily be PDF. What if they want a PNG?
For all functions originally saving PDF plots to the user's working directory, I have followed your suggestion and split them into analysis functions and plotting functions. Then, users can plot the figures with separate functions and decide how (and if) they want to save the plots.
Great!
- Avoid, where possible, numeric indices and use more robust named /character indices.
Numeric indices are used when the order of variables matters more than their names, such as in situations when the input object can have different variable names, but their variable order must be exactly as specified in the function's documentation. When this was not the case, I tried to use named indices.
The robust way of doing this is to have the user specify the column names (with sensible defaults in the function definition) and order based on those values rather than indices.
Best, Marcel
Received a valid push on git.bioconductor.org; starting a build for commit id: f9e82a43cd483fb10589a34d7e8ddd076362ecbc
Hi Marcel @LiNk-NY
The goal of BioNERO is to facilitate network analysis pipelines by integrating all steps in a single package. Hence, although some functionalities are implemented in Bioconductor, our focus was to make different packages communicate in a common syntax. This way, users won't have to learn the syntaxes of many different packages for a single pipeline. Additionally, as some packages implement their own S3/S4 Methods and Classes, it is hard to intercommunicate between packages.
In a previous comment you said that you wanted to avoid additional dependencies but having 'all steps in a single package' goes against that. The point of working in the Bioconductor ecosystem is to have interoperability where no one package provides all the functionality. If you wanted to guide the user step-by-step through a pipeline, perhaps a workflow package would be more apt.
This package does not simply offer a step-by-step through a pipeline, but it contains new solutions for existing problems in each of the steps. A workflow package wouldn't be appropriate, as several functionalities implemented here are not implemented anywhere. As the packages I have imported are very stable, I don't think the imports raise much concern.
- Provide an appropriate
biocViews
field based on the hierarchy on this page: http://bioconductor.org/packages/release/BiocViews.html#___SoftwareI have included "Software" now. All biocViews terms were already in their order of appearance in the hierarchy. Should I explicitly include terms such as "AssayDomain" and "BiologicalQuestion", or only their child terms are enough?
This doesn't look quite right. AFAIU, the package should land in only one bucket. For example, Software > AssayDomain > GeneExpression.
Please, check other Bioconductor packages or let me know if this is a new Bioconductor practice. If it is, it should be documented somewhere. I have checked 5 randomly selected Bioc packages and all of them have biocViews terms from different categories. As an example, see the DESCRIPTION file for DESeq2: https://github.com/mikelove/DESeq2/blob/master/DESCRIPTION
Anyway, I removed some biocViews terms to reduce the number of terms.
- A high number of
Imports
can lead to fragility of the package. Please try to reduce the amount of imports if possible.Please, see the first line of my responses. I have tried my best to reduce the number of imports. However, as the package aims to integrate the whole pipeline in a single package, it is reasonable to have a higher number of imports as compared to other packages. The alternative to this would be to reimplement existing functions from other packages, but reinventing the wheel is something Bioconductor does not encourage.
This is not what we encourage. See my comment about workflows above.
See my response above. Besides, I ran checks in my local machine with --as-cran. CRAN checks throw NOTEs if packages import more packages than a given number (I don't remember which number it is). This was not the case for BioNERO, which means that it imports a reasonable number of packages. Further, as I mentioned above, I made sure the packages I was importing are stable and trustworthy. Thus, I don't it will be a problem.
- Do not write files / plot figures to the user's directory. These should be displayed as output from the function. The user can then save them in the format of their choosing which may not necessarily be PDF. What if they want a PNG?
For all functions originally saving PDF plots to the user's working directory, I have followed your suggestion and split them into analysis functions and plotting functions. Then, users can plot the figures with separate functions and decide how (and if) they want to save the plots.
Great!
- Avoid, where possible, numeric indices and use more robust named /character indices.
Numeric indices are used when the order of variables matters more than their names, such as in situations when the input object can have different variable names, but their variable order must be exactly as specified in the function's documentation. When this was not the case, I tried to use named indices.
The robust way of doing this is to have the user specify the column names (with sensible defaults in the function definition) and order based on those values rather than indices.
Honestly, I don't see how this can be better. Speaking for myself, I don't see why I would have to specify column names just for indexing when using a function from a package. It is way easier and more practical to let users name their columns as they want and just pick the column indices. For instance, if a user names his column containing sample metadata as "Tissue" and other user names it "Condition", it doesn't make any difference. If you found a special case where named indices would be better than numeric ones, please let me know. I have scanned my code looking for such examples, but I didn't find any.
Thank you for your kind comments. Best, Fabricio
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/BioNERO
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Hi Fabricio, @almeidasilvaf
Please, check other Bioconductor packages or let me know if this is a new Bioconductor practice. If it is, it should be documented somewhere. I have checked 5 randomly selected Bioc packages and all of them have biocViews terms from different categories. As an example, see the DESCRIPTION file for DESeq2: https://github.com/mikelove/DESeq2/blob/master/DESCRIPTION
I've confirmed with Lori that you can use multiple leaf nodes as long as they're from the same Software
trunk in biocViews
.
We will update the package guidelines to make this more clear https://www.bioconductor.org/developers/package-guidelines/#description
See my response above. Besides, I ran checks in my local machine with --as-cran. CRAN checks throw NOTEs if packages import more packages than a given number (I don't remember which number it is). This was not the case for BioNERO, which means that it imports a reasonable number of packages. Further, as I mentioned above, I made sure the packages I was importing are stable and trustworthy. Thus, I don't it will be a problem.
It is always better to lean towards the minimal end. The risk would still non-zero as opposed to zero when you avoid the import. A high number of imports would increase that risk. We have gone through several deprecated packages where maintainers had this same idea and weren't keen to updating their packages because of large changes in other packages or heavy dependencies on other deprecated packages.
Honestly, I don't see how this can be better. Speaking for myself, I don't see why I would have to specify column names just for indexing when using a function from a package. It is way easier and more practical to let users name their columns as they want and just pick the column indices. For instance, if a user names his column containing sample metadata as "Tissue" and other user names it "Condition", it doesn't make any difference. If you found a special case where named indices would be better than numeric ones, please let me know. I have scanned my code looking for such examples, but I didn't find any.
Perhaps some education is needed here. Using character indices is always more robust than numeric indices because the data column order can change. You never know what the user is going to provide as input. A function with robust code will work no matter the order of the columns. Your code currently will not if the order of the columns is changed or worse it will give the wrong answer. There is no special case here this is a general good practice. If a particular technology uses the term Tissue
consistently in their output table, then you could use Tissue
as your column default tissueCol = "Tissue"
which would be independent of the index in the table when you do df[[tissueCol]]
. Of course, this is not always convenient for the developer but robust code is not meant to be convenient though it can still be convenient for the user.
Best, Marcel
Received a valid push on git.bioconductor.org; starting a build for commit id: f01904407ae8d8e091270800dce6170fd8735a05
Hi, Marcel @LiNk-NY
Please, check other Bioconductor packages or let me know if this is a new Bioconductor practice. If it is, it should be documented somewhere. I have checked 5 randomly selected Bioc packages and all of them have biocViews terms from different categories. As an example, see the DESCRIPTION file for DESeq2: https://github.com/mikelove/DESeq2/blob/master/DESCRIPTION
I've confirmed with Lori that you can use multiple leaf nodes as long as they're from the same
Software
trunk inbiocViews
. We will update the package guidelines to make this more clear https://www.bioconductor.org/developers/package-guidelines/#description
I am not sure if I got it right, but I have now only kept one term per leaf node. See if it is ok now. Indeed, it would be great to include this in the package guidelines.
See my response above. Besides, I ran checks in my local machine with --as-cran. CRAN checks throw NOTEs if packages import more packages than a given number (I don't remember which number it is). This was not the case for BioNERO, which means that it imports a reasonable number of packages. Further, as I mentioned above, I made sure the packages I was importing are stable and trustworthy. Thus, I don't it will be a problem.
It is always better to lean towards the minimal end. The risk would still non-zero as opposed to zero when you avoid the import. A high number of imports would increase that risk. We have gone through several deprecated packages where maintainers had this same idea and weren't keen to updating their packages because of large changes in other packages or heavy dependencies on other deprecated packages.
I totally understand your concerns, but I think we are all at some risk. Updating packages is one of the duties of developers, and I don't see why a developer would refuse to update his/her package if something breaks because of dependencies. It seems like abandoning a child. Besides, when we submit a package, we all commit to the long-term maintenance of our packages. IMO, not updating a package after saying you would is an ethical flaw. As I have committed to the maintenance of this package in the very beginning of this issue, I can guarantee that I will fix issues if there are any in the future.
Honestly, I don't see how this can be better. Speaking for myself, I don't see why I would have to specify column names just for indexing when using a function from a package. It is way easier and more practical to let users name their columns as they want and just pick the column indices. For instance, if a user names his column containing sample metadata as "Tissue" and other user names it "Condition", it doesn't make any difference. If you found a special case where named indices would be better than numeric ones, please let me know. I have scanned my code looking for such examples, but I didn't find any.
Perhaps some education is needed here. Using character indices is always more robust than numeric indices because the data column order can change. You never know what the user is going to provide as input. A function with robust code will work no matter the order of the columns. Your code currently will not if the order of the columns is changed or worse it will give the wrong answer. There is no special case here this is a general good practice. If a particular technology uses the term
Tissue
consistently in their output table, then you could useTissue
as your column defaulttissueCol = "Tissue"
which would be independent of the index in the table when you dodf[[tissueCol]]
. Of course, this is not always convenient for the developer but robust code is not meant to be convenient though it can still be convenient for the user.
I understand how named indices are more robust because they are not dependent on order, but if you see the functions' documentations, you will notice that they all make very clear to the user how the input should look like. The user should read the functions' docs. Please, see an example from the function plot_ppi()
:
As we always specify how the data should look like, users would not change the column orders inadvertently. Nevertheless, I have now transformed some numeric indices to named indices where I think they would be more suitable. Finally, as this indexing "rule" seems to be one of your requirements, would you consider adding it to the package guidelines, so that developers know beforehand how they are expected to write their code? https://www.bioconductor.org/developers/package-guidelines/#rcode This would be very helpful, as some of us have practices that do not conform to Bioc best practices (e.g., vapply() instead of sapply(), seq_len() instead of 1:n...).
Best, Fabricio
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/BioNERO
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Hi Fabricio, @almeidasilvaf
It is always better to lean towards the minimal end. The risk would still non-zero as opposed to zero when you avoid the import. A high number of imports would increase that risk. We have gone through several deprecated packages where maintainers had this same idea and weren't keen to updating their packages because of large changes in other packages or heavy dependencies on other deprecated packages.
I totally understand your concerns, but I think we are all at some risk. Updating packages is one of the duties of developers, and I don't see why a developer would refuse to update his/her package if something breaks because of dependencies. It seems like abandoning a child. Besides, when we submit a package, we all commit to the long-term maintenance of our packages. IMO, not updating a package after saying you would is an ethical flaw. As I have committed to the maintenance of this package in the very beginning of this issue, I can guarantee that I will fix issues if there are any in the future.
Of course, other things can get in the way of maintaining a package. Therefore, we are left with packages that become deprecated http://bioconductor.org/about/removed-packages/. Thank you for committing to update your package.
Honestly, I don't see how this can be better. Speaking for myself, I don't see why I would have to specify column names just for indexing when using a function from a package. It is way easier and more practical to let users name their columns as they want and just pick the column indices. For instance, if a user names his column containing sample metadata as "Tissue" and other user names it "Condition", it doesn't make any difference. If you found a special case where named indices would be better than numeric ones, please let me know. I have scanned my code looking for such examples, but I didn't find any.
Perhaps some education is needed here. Using character indices is always more robust than numeric indices because the data column order can change. You never know what the user is going to provide as input. A function with robust code will work no matter the order of the columns. Your code currently will not if the order of the columns is changed or worse it will give the wrong answer. There is no special case here this is a general good practice. If a particular technology uses the term
Tissue
consistently in their output table, then you could useTissue
as your column defaulttissueCol = "Tissue"
which would be independent of the index in the table when you dodf[[tissueCol]]
. Of course, this is not always convenient for the developer but robust code is not meant to be convenient though it can still be convenient for the user.I understand how named indices are more robust because they are not dependent on order, but if you see the functions' documentations, you will notice that they all make very clear to the user how the input should look like. The user should read the functions' docs. Please, see an example from the function
plot_ppi()
:' @param edgelist_int Data frame containing the edge list for the PPI network.
' First column is the protein 1 and second column is the protein 2.
' All other columns are interpreted as edge attributes.
As we always specify how the data should look like, users would not change the column orders inadvertently. Nevertheless, I have now transformed some numeric indices to named indices where I think they would be more suitable. Finally, as this indexing "rule" seems to be one of your requirements, would you consider adding it to the package guidelines, so that developers know beforehand how they are expected to write their code? https://www.bioconductor.org/developers/package-guidelines/#rcode This would be very helpful, as some of us have practices that do not conform to Bioc best practices (e.g., vapply() instead of sapply(), seq_len() instead of 1:n...).
Thanks for making those changes. Relying on users to have read the documentation is more of a challenge than implementing a robust fail safe option.
Your package has been accepted. Thank you for your submission.
Best regards, Marcel
Your package has been accepted. It will be added to the Bioconductor nightly builds.
Thank you for contributing to Bioconductor!
The master branch of your GitHub repository has been added to Bioconductor's git repository.
To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/almeidasilvaf.keys is not empty), then no further steps are required. Otherwise, do the following:
See further instructions at
https://bioconductor.org/developers/how-to/git/
for working with this repository. See especially
https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/
to keep your GitHub and Bioconductor repositories in sync.
Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at
https://bioconductor.org/checkResults/
(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("BioNERO")
. The package 'landing page' will be created at
https://bioconductor.org/packages/BioNERO
If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.
Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor
Confirm the following by editing each check box to '[x]'
[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
[x] I am familiar with the Bioconductor code of conduct and agree to abide by it.
I am familiar with the essential aspects of Bioconductor software management, including:
For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.