Closed bschilder closed 2 years ago
@NathanSkene some of these checks might fail, actually, bc we need to add some variables to GitHub Secrets for this repo. It's really quick and easy, so I can walk you through it via Slack on Monday
Update DESCRIPTION Version to 1.3.1 to reflect latest release
.github/workflows/check-bioc-docker.yml:
run_docker: 'false'
- Is this parameter in use? It is pushing to docker with a commit now correct? Also can you test to ensure a push to docker won't occur if a check is failed (either in EWCE or orthogene)
This feature is not being used at the moment. ideally, it would be most efficient to build the Docker container once, test it, and push to DockerHub all in one workflow, but I couldn't get this working smoothly and opted to just use the separate dockerhub.yml workflow based on scFlow.
Alan and I discussed this. While naming this update 2.0 might be useful for remembering which version had major changes, this isnt allowed by Bioconductor (they dictate the version changes according to their devel/release schedule). Instead, I'll document these updates in the NEWS, README, and vignettes.
"@param defaultBin Which bin to assign when there's only one non-zero quantile."
- I'm not sure I understand what this parameter is doing from the description, can you elaborate a little?
If you request 40 bins but there's only 1 unique value in the vector, how do you assign it? 1? 40? defaultBin
sets this value as the median bin number.#' @param bg List of gene symbols containing the background gene list
#' (including hit genes). If \code{bg=NULL},
#' an appropriate gene background will be created automatically.
#' if \code{geneSizeControl=TRUE}.
@param genelistSpecies Species that \code{hits} genes came from
#' (no longer limited to just "mouse" and "human").
Now points users to new function list_species
#' @param geneSizeControl Whether you want to control for
#' GC content and transcript length. Recommended if the gene list originates
#' from genetic studies (\emph{Default: FALSE}).
#' If set to \code{TRUE}, then \code{hits} must be from humans.
#' should be used rather than mouse.
Can still only use human since gene length will differ between species, and we only have a human reference for this info. Would be nice to extend to other species eventually tho.
"#### Guards against issues with DelayedArray"
"#### Guards against issues with DelayedArray"
I just tried to break up the code by putting it into smaller functions. You'll have to ask @NathanSkene regarding what this does exactly.
Same as above. @NathanSkene
#' @param DGE_method: Which method to use for the Differential Gene Expression (DGE) step.
- list out the options for the user.#' @param input_species Which species the gene names in \code{exp} come from. #' @param output_species
Which species' genes names to convert \code{exp} to` - again give the users a function to list all available species. adj_pval_thresh = 0.00001
, - this seems odd to me that's it's so low, why not just 0.05? I get you may have some FP's but surely EWCE will deal with these? Has an analysis been done comparing the different DEG approaches against a benchmark dataset? I think this is important to do since this is a key part of your expansion on EWCE. I imagine you will publish this version of EWCE right? This would form a fundamental part of that paper.Not sure why this parameter was originally chosen. @NathanSkene ?
no_cores = 1
, - again parallel capacity should be in vignette/NEWS file#' @param tt` Differential expression table.
#' Can be output of \link[limma]{topTable} function.
#' Minimum requirement is that one column stores a metric of ...
Only limma is currently supported for this, as the output columns are tool-specific.
\link[EWCE]{bootstrap_enrichment_test} or
#' \link[EWCE]{ewce_expression_data} functions.
#' Multiple results tables can be
#' merged into one results table, as long as the 'list' column is set to
#' distinguish them.
From original EWCE. Ask @NathanSkene
The package.R file is special, it give users a description of the package when they run ?EWCE
.
From the original EWCE. Ask @NathanSkene
From the original EWCE. Ask @NathanSkene
They're works in progress.
[x] LRT is the correct default as it has been proven to be better than Wald but I think you should give the user the option to choose Wald too.
Added new arg dge_test
[ ] More generally, I think you should add in edgeR as an DEG option, in our analysis of scRNA-Seq methods, edgeR performs better than DESeq2. See this benchmark paper for the same results (fig 1(c)). Also you should offer the two test types of LRT and QLF but again LRT should be the default
Would be a great add-on but I'd ask that you implement any additional DGE methods you'd like to use.
Correct, this will be updated when this branch is merged
Not sure what you mean. Feel free to edit the README further after merging. These are the current instructions.
if (!require("BiocManager")){install.packages("BiocManager")}
BiocManager::install("EWCE")
cortex_mrna$exp_scT_normed <- EWCE::sct_normalize(cortex_mrna$exp)
can you block the output of this R chunk? It takes up too much of the vignetteAll vignettes are accessible via the "Articles" table of the docs website. Will add link in README as well.
devtools::build_site()
before pushing? In your new GHA and website push approach is this needed? I think it is still necessary as the website is built from the files in docs/articles/The new GHA workflow automatically rebuilds the whole docs website with pkdown and pushes to the gh-pages branch. So manually rebuilding is no longer necessary.
[ ] devtools check gives note which you will need to sort (remove git history of png's .RData objects should do it):
> checking installed package size ... NOTE
installed size is 6.8Mb
sub-directories of 1Mb or more:
doc 6.3Mb
I went through and tried to prune the git history multiple times using all the strategies you suggested, as well as reduced the vignette file size (ie what populates doc/ during building). I'll give it another pass, but if not maybe you could try to get it down further @Al-Murphy ?
[x] Code coverage check with devtools::test_coverage()
gave ~70%, Brian I think you need to add more tests to cover the new functionality. We should aim for 80% coverage but more importantly make sure all new functionality is thoroughly tested.
Code coverage is now >84%. All CRAN/Bioc checks are passing locally, but it does take a while. We may have to do some further optimization with tests/examples to keep it below the 15m limit.
In the description, you have now changed it to 1.3.2 since you made additional changes. However since these first batch of changes weren't pushed to bioconductor it should still be 1.3.1. Can you revert?
In the description, you have now changed it to 1.3.2 since you made additional changes. However since these first batch of changes weren't pushed to bioconductor it should still be 1.3.1. Can you revert?
Oh i see, didn't realize that bioc doesn't let you skip versions. Np, changing it back.
Notes:
calculate_specificity_for_level
, bootstrap_plot
, cell_list_dist
, check_annotLevels
, check_args_for_bootstrap_plot_generation
,check_bootstrap_args
, check_controlled_args
, check_ewce_expression_data_args
, check_full_results
, check_generate_controlled_bootstrap_geneset
, check_species
, check_nas
, check_group_name
, check_numeric
, get_exp_data_for_bootstrapped_genes
, get_graph_theme
, ``delayedarray_normalize.R
- Is there a unit test to check this gives the same result as original (not delayed matrix) approach? If not, can you add one? sce_merge_comparable_levels.R
, this is all commented out, can it be removed?R>=4.1
and Bioconductor>=1.4
" - Should be Bioconductor>=3.14
── R CMD check results ────────────────────────────────────────────────────────────────────────────────────── EWCE 1.3.1 ────
Duration: 16m 13.7s
checking installed package size ... NOTE installed size is 6.9Mb sub-directories of 1Mb or more: doc 6.3Mb
checking top-level files ... NOTE Non-standard file/directory found at top level: ‘doc’
Add documentation (description and parameters) to:
calculate_specificity_for_level
bootstrap_plot
cell_list_dist
check_annotLevels
check_args_for_bootstrap_plot_generation
check_bootstrap_args
check_controlled_args
check_ewce_expression_data_args
check_full_results
check_generate_controlled_bootstrap_geneset
check_species
check_nas
check_group_name
check_numeric
get_exp_data_for_bootstrapped_genes
get_graph_theme
I'll try to do some of these, but as I mentioned before many of them have parameters that were named by @NathanSkene and I'm unsure what they are. I think this is why we wanted to have the meeting first.
In the meantime, the best I can do for some parameters is simply repeating what the argument name is. I am marking all of these with (#fix) so we can know where they are.
@param hit.exp hit.exp (#fix)
Which original normalization function are you referring to? sct_normalize
?
I wouldn't expect delayedarray_normalize
to be exactly the same since it's a different procedure to what SCT does. However, i can at least confirm that DelayedArrays do indeed work as input to sct_normalize
.
I would not expect the results to be exactly the same, since EWCE now uses gene backgrounds generated by orthogene
. The goal of this was to improve the accuracy of results EWCE produces (due to more comprehensive ortholog data), in addition to expanding EWCE's applicability to other species.
However, the tests I do have ensure that at least some of the top cell-types are still enriched (see test-bootstrap_enrichment_test.R
)
Two CRAN check notes that still need to be sorted:
I first tried preventing all code chunks from running in the extended vignette. But this barely reduced the total package size (6.2 --> 6 Mb). What made a huge difference was merging the vignettes. I guess you were right @Al-Murphy , there really is ton of overhead per vignette!
By merging all 5 vignettes into just 2, I've now managed to get the whole package under 5Mb.
> checking installed package size ... NOTE
installed size is 6.9Mb
sub-directories of 1Mb or more:
doc 6.3Mb
> checking top-level files ... NOTE
Non-standard file/directory found at top level:
‘doc’
Avoided this by adding ^doc$
to the .Rbuildignore file.
Yay!!! 🍾
Great work, very happy to see this, getting EWCE working with big datasets was basically the first thing I wanted done back when the lab started, thanks for getting it here!
Merry christmas!
From: Brian M. Schilder @.> Sent: 23 December 2021 11:27 To: NathanSkene/EWCE @.> Cc: Skene, Nathan G @.>; Mention @.> Subject: Re: [NathanSkene/EWCE] EWCE 2.0 (PR #47)
This email from @.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders listhttps://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.
Yay!!! 🍾
— Reply to this email directly, view it on GitHubhttps://github.com/NathanSkene/EWCE/pull/47#issuecomment-1000239244, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH5ZPE2J6QLTVIUITU4IB2DUSMBSZANCNFSM5GSRXSVQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you were mentioned.Message ID: @.***>
Note on branches
I previously made all all my changes to the forked repo bschilder/EWCE on the DelayedArray branch. However, something got screwed up with the git history in that fork, and the only way to push to the main repo was to clone the main repo (NathanSkene/EWCE), create a new branch called
bschilder_dev
, copy and paste all the files with myDelayedArray
branch's edits into this repo, and then make a Pull Request from there. This means you won't have the incremental commits I made as I upgraded EWCE, but at least everything will be harmonized moving forward. I'll delete my old forked repo and make any future edits on this NathanSkene/EWCE@bschilder_dev branch.Upgrades & new features
orthogene
.standardise_ctd
.DelayedArray
object class.plot_ctd
).EWCE::example_bootstrap_results()
.To do