Issue with background gene lists and orthogene

Al-Murphy commented 1 year ago

1. Bug description

Gene lists aren't filtered based on reference scRNA-Seq dataset. This caused an error in a recent run where only one background gene was left after filtering to the SCT dataset.

Console output

Error in apply(temp, 2, sum) : dim(X) must have a positive length
Calls: <Anonymous> ... <Anonymous> -> lapply -> FUN -> cell_list_dist -> apply

Expected behaviour

I would expect that gene lists should be filtered by the SCT dataset early on to avoid issues downstream like this. For example with background gene lists that are generated to match gene length of the hits list, if any of these genes are removed after this step, it will massively affect this gene length control.

Al-Murphy commented 1 year ago

@bschilder just so you are aware

I was wrong, the issue actually only occurs when the user pass a background list and the reference dataset species is the same as the gene list species. The issue is with orthogene and is described here: https://github.com/neurogenomics/orthogene/issues/22

In short, orthogene has been replacing the background gene list with one generated from all known genes leading to the issues described above. I have made a fix in version 1.5.7 and have made a note for orthogene to have this issue fixed at source

bschilder commented 1 year ago

Ah, thanks for fixing this! Will implement directly in orthogene soon.

NathanSkene / EWCE