ebi-gene-expression-group / scxa-workflows

Higher level repo for aggregating all of Atlas workflow logic for Single Cell
Apache License 2.0
13 stars 2 forks source link

Add symbol to MTX genes file before reading #22

Closed pinin4fjords closed 3 years ago

pinin4fjords commented 3 years ago

This PR addresses an issue reported by @pcm32 whereby the objects resulting from our production workflows had gene IDs in the 'gene symbols' column. Gene symbols were still there under 'gene name' but possibly not immediately evident to users.

The cause of this issue is the 10X-reading functionality of Scanpy that uses the second column from genes.tsv to populate 'gene symbols'. I've used @nomadscientist 's solution from https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/droplet-quantification-preprocessing/tutorial.html to add gene symbols to the genes.tsv before the file is read by Scanpy.

See new workflow variant at http://galaxy-gxa-001:8089/u/jmanning/w/scanpy-prod-160-clustering-with-harmony-batch-adjustment-h5ad---groupby-as-file-imported-from-uploaded-file for live view of these changes.

pcm32 commented 3 years ago

Sorry, please add a change of version to the workflow name or somewhere! Thanks!

pinin4fjords commented 3 years ago

This not really "reviewable" I guess, so ITYWFI ;-).

Well, I did add the link so you can review the workflow.

pinin4fjords commented 3 years ago

Thanks @pcm32 , merging now.