Closed george-hall-ucl closed 1 year ago
I have done some more digging and it looks like I mean that gene sets aren't loading from file, rather than annotations.
The command run by cellxgene-gateway is:
cellxgene launch --annotations-file /path/to/test1-gene-sets-NS3OKLZ5.csv dataset.h5ad
Changing the command to:
cellxgene launch --gene-sets-file /path/to/test1-gene-sets-NS3OKLZ5.csv dataset.h5ad
loads the gene sets as desired.
So, I guess this is the command that I want cellxgene-gateway to execute. Is there any way to make it do this?
Hi @george-hall-ucl ! You are indeed misunderstanding how this currently works, and in particular the difference between annotations and gene sets. Annotations are like "cell sets" rather than gene sets. When annotations are enabled, you can click the "Create new category" button to add a new "Category" and then add "Labels" within the categories and assign cells to the labels. I have attached a screenshot of this process.
Assuming that you are naming your annotations "test1", the command that cellxgene gateway runs should be
cellxgene launch --annotations-file dataset_annotations/test1.csv dataset.h5ad
This made sense originally because there were no gene sets. I think the best way to enable what you want would be to add support for a new environment variable GATEWAY_ENABLE_GENE_SETS that will additionally set the --gene-sets-file
parameter as follows:
Case 1: GATEWAY_ENABLE_GENE_SETS alone is set
cellxgene launch --gene-sets-file dataset_annotations/test1-gene-sets.csv dataset.h5ad
Case 2: GATEWAY_ENABLE_GENE_SETS and GATEWAY_ENABLE_ANNOTATIONS are set
cellxgene launch --annotations-file dataset_annotations/test1.csv dataset.h5ad --gene-sets-file dataset_annotations/test1-gene-sets.csv dataset.h5ad
This should let annotations and gene sets play nicely and independently from each other. I'll test this when I get a chance and push it on a branch... let know what you think. Are you savvy enough to be able to run the code from a branch? I could also try and figure out how to publish a "pre-release" version to pypi.
Hi @alokito!
Many thanks for your response. Sounds a good solution to me. I will code it up today and send a pull request.
If anyone is reading this before this fix has been implemented and has the same problem, my current workaround is to set CELLXGENE_LOCATION
to a script that adds --gene-sets-file
(and the corresponding file) to the correct place in the call to cellxgene
(see here). This fix will be much more stable, though!
@alokito I have now implemented a GATEWAY_ENABLE_GENE_SETS
flag: please see my pull request.
As I explain in the PR, I have implemented it in a simple way that meets my needs, but it may need more consideration before actual release! Hopefully this is a useful start, at least.
@george-hall-ucl After thinking about this some more, I'm not sure that there's a use case for setting GATEWAY_ENABLE_ANNOTATIONS without GATEWAY_ENABLE_GENE_SETS. I'm thinking it would be simpler to just have GATEWAY_ENABLE_ANNOTATIONS enable both. The fact that you opened this ticket is good evidence that the distinction between annotations and gene sets is confusing to people, and most likely a historical artifact. I just pushed a few commits and will make a PR... will hopefully have time to merge and cut a release this weekend.
Yes, sounds sensible. Thank you for your help with this!
This is deployed to Pypi, please open a new ticket if there are any remaining issues.
Hi, thanks for the nice tool!
I'm running cellxgene-gateway v0.3.10 and cellxgene v1.1.2 locally on a MacBook Pro.
If I run "export GATEWAY_ENABLE_ANNOTATIONS=1" and create a new annotations file from filecrawl then it is saved to the csv file and is displayed in filecrawl as expected. However, when I try to reload the dataset by clicking on the annotation file's name, no gene sets are displayed. If I create a new gene set, then a new csv is created with its name appended to the existing file's name (e.g. if the first csv is called "test1-gene-sets-R64TJAID.csv" then this new one is "test1-gene-sets-R64TJAID-gene-sets-R64TJAID.csv"). I am terminating the app by CTRL-C'ing in the terminal.
Am I misunderstanding how this should work, or is this a bug?
Many thanks in advance!