NathanSkene / EWCE

Expression Weighted Celltype Enrichment. See the package website for up-to-date instructions on usage.
https://nathanskene.github.io/EWCE/index.html
53 stars 25 forks source link

Give error if generate_celltype_data is called with ensembl ID's as gene names #9

Closed NathanSkene closed 2 years ago

bschilder commented 2 years ago

standardise_ctd

standardise_ctd is now called within bootstrap_enrichment_test. https://github.com/bschilder/EWCE/blob/8e9ca154c98f2afb4c01a0a0e1ade92a690a2cce/R/bootstrap_enrichment_test.R#L134

If CTD is from non-human

standardise_ctd will use orthogene to convert non-human genes to 1:1 human orthologs (can be any species, not just muse).

If CTD is from human

standardise_ctd has the ability to detect ensembl IDs and convert them to HGNC symbols using orthogene. Since there's often more Ensembl IDs than HGNC symbols, orthogene will aggregate aggregate the matrix by summing read counts.

https://github.com/bschilder/EWCE/blob/8e9ca154c98f2afb4c01a0a0e1ade92a690a2cce/R/extract_matrix.R#L74

generate_celltype_data

In addition, I've embedded standardise_ctd directly within generate_celltype_data. It can now convert to 1:1 human HGNC symbols. https://github.com/bschilder/EWCE/blob/8e9ca154c98f2afb4c01a0a0e1ade92a690a2cce/R/generate_celltype_data.r#L110

However, when the input to the generate_celltype_data has (non)-human Ensembl IDs, it would be nice to aggregate these within species first (as above), then proceed to dropping non 1:1 orthologs.