Novartis / cellxgene-gateway

Cellxgene Gateway allows you to use the Cellxgene Server provided by the Chan Zuckerberg Institute (https://github.com/chanzuckerberg/cellxgene) with multiple datasets.
Apache License 2.0
56 stars 32 forks source link

Gene-sets not loading from file #81

Closed george-hall-ucl closed 1 year ago

george-hall-ucl commented 1 year ago

Hi, thanks for the nice tool!

I'm running cellxgene-gateway v0.3.10 and cellxgene v1.1.2 locally on a MacBook Pro.

If I run "export GATEWAY_ENABLE_ANNOTATIONS=1" and create a new annotations file from filecrawl then it is saved to the csv file and is displayed in filecrawl as expected. However, when I try to reload the dataset by clicking on the annotation file's name, no gene sets are displayed. If I create a new gene set, then a new csv is created with its name appended to the existing file's name (e.g. if the first csv is called "test1-gene-sets-R64TJAID.csv" then this new one is "test1-gene-sets-R64TJAID-gene-sets-R64TJAID.csv"). I am terminating the app by CTRL-C'ing in the terminal.

Am I misunderstanding how this should work, or is this a bug?

Many thanks in advance!

george-hall-ucl commented 1 year ago

I have done some more digging and it looks like I mean that gene sets aren't loading from file, rather than annotations.

The command run by cellxgene-gateway is: cellxgene launch --annotations-file /path/to/test1-gene-sets-NS3OKLZ5.csv dataset.h5ad

Changing the command to: cellxgene launch --gene-sets-file /path/to/test1-gene-sets-NS3OKLZ5.csv dataset.h5ad loads the gene sets as desired.

So, I guess this is the command that I want cellxgene-gateway to execute. Is there any way to make it do this?

alokito commented 1 year ago

Hi @george-hall-ucl ! You are indeed misunderstanding how this currently works, and in particular the difference between annotations and gene sets. Annotations are like "cell sets" rather than gene sets. When annotations are enabled, you can click the "Create new category" button to add a new "Category" and then add "Labels" within the categories and assign cells to the labels. I have attached a screenshot of this process.

categoryLabelAnnotation

Assuming that you are naming your annotations "test1", the command that cellxgene gateway runs should be

cellxgene launch --annotations-file dataset_annotations/test1.csv dataset.h5ad

This made sense originally because there were no gene sets. I think the best way to enable what you want would be to add support for a new environment variable GATEWAY_ENABLE_GENE_SETS that will additionally set the --gene-sets-file parameter as follows:

Case 1: GATEWAY_ENABLE_GENE_SETS alone is set

cellxgene launch --gene-sets-file dataset_annotations/test1-gene-sets.csv  dataset.h5ad

Case 2: GATEWAY_ENABLE_GENE_SETS and GATEWAY_ENABLE_ANNOTATIONS are set

cellxgene launch  --annotations-file dataset_annotations/test1.csv dataset.h5ad --gene-sets-file dataset_annotations/test1-gene-sets.csv dataset.h5ad

This should let annotations and gene sets play nicely and independently from each other. I'll test this when I get a chance and push it on a branch... let know what you think. Are you savvy enough to be able to run the code from a branch? I could also try and figure out how to publish a "pre-release" version to pypi.

george-hall-ucl commented 1 year ago

Hi @alokito!

Many thanks for your response. Sounds a good solution to me. I will code it up today and send a pull request.

If anyone is reading this before this fix has been implemented and has the same problem, my current workaround is to set CELLXGENE_LOCATION to a script that adds --gene-sets-file (and the corresponding file) to the correct place in the call to cellxgene (see here). This fix will be much more stable, though!

george-hall-ucl commented 1 year ago

@alokito I have now implemented a GATEWAY_ENABLE_GENE_SETS flag: please see my pull request.

As I explain in the PR, I have implemented it in a simple way that meets my needs, but it may need more consideration before actual release! Hopefully this is a useful start, at least.

alokito commented 1 year ago

@george-hall-ucl After thinking about this some more, I'm not sure that there's a use case for setting GATEWAY_ENABLE_ANNOTATIONS without GATEWAY_ENABLE_GENE_SETS. I'm thinking it would be simpler to just have GATEWAY_ENABLE_ANNOTATIONS enable both. The fact that you opened this ticket is good evidence that the distinction between annotations and gene sets is confusing to people, and most likely a historical artifact. I just pushed a few commits and will make a PR... will hopefully have time to merge and cut a release this weekend.

george-hall-ucl commented 1 year ago

Yes, sounds sensible. Thank you for your help with this!

alokito commented 1 year ago

This is deployed to Pypi, please open a new ticket if there are any remaining issues.