galaxyproject / training-material

A collection of Galaxy-related training material
https://training.galaxyproject.org
MIT License
294 stars 846 forks source link

New tutorial - single cell data import and format conversion #4590

Closed wee-snufkin closed 3 months ago

wee-snufkin commented 5 months ago

Tutorial describing the most common single-cell datatypes, how to import data using EBI SCXA and HCA tools, and how to convert between the formats.

Can leave it as one big tutorial or split into separate smaller ones.

@nomadscientist

UPDATES AFTER THE REVIEW:

nomadscientist commented 5 months ago

Wooo! Testing this TODAY!

wee-snufkin commented 5 months ago

Will try to fix the workflow tests later today...

nomadscientist commented 5 months ago

I think the 'downsampling' section should go AFTER the Conversion section, as it will be less common that a user needs to do it

nomadscientist commented 5 months ago

The tutorial takes a long time to run, so I wonder if you should use the downsampled or a downsampled AnnData object to do the conversions rather than the full mito-anndata?

nomadscientist commented 5 months ago

I'd say that this section AnnData --> CDS

You could just move that entire section from the Monocle tutorial to this new one. Then you can just reference/link to it in the Monocle tutorial, and start people off with the data in the right format there.

That's nice because then courses can decide how much data manipulation vs analysis to include, AND it is consistent with the rest of the tutorials

wee-snufkin commented 5 months ago

@nomadscientist your suggestions applied, good to go from my end!

@hexylena does linting fail because of linking the subsection? What else is wrong? Lmk if/how I can fix the checks

@mtekman I tested the AnnData object in .h5ad (successfully converted from Seurat by the new SCEasy Converter) using Inspect AnnData tool, but it failed with the following error:

Traceback (most recent call last): File "/usr/local/tools/_conda/envs/mulled-v1-f1891eb62511084a03c08c08b82a6b785abdf465b18ed9ac4a36c19217d84a96/lib/python3.9/site-packages/anndata/_io/utils.py", line 156, in func_wrapper return func(elem, *args, **kwargs) File "/usr/local/tools/_conda/envs/mulled-v1-f1891eb62511084a03c08c08b82a6b785abdf465b18ed9ac4a36c19217d84a96/lib/python3.9/site-packages/anndata/_io/h5ad.py", line 532, in read_group EncodingVersions[encoding_type].check( File "/usr/local/tools/_conda/envs/mulled-v1-f1891eb62511084a03c08c08b82a6b785abdf465b18ed9ac4a36c19217d84a96/lib/python3.9/enum.py", line 408, in __getitem__ return cls._member_map_[name] KeyError: 'dict' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/data/jwd02f/main/065/717/65717642/configs/tmpxnq1iw99", line 11, in <module> adata = ad.read('/data/dnb09/galaxy_db/files/b/e/a/dataset_bea25832-2b41-4b14-8c17-91383d1812bf.dat') File "/usr/local/tools/_conda/envs/mulled-v1-f1891eb62511084a03c08c08b82a6b785abdf465b18ed9ac4a36c19217d84a96/lib/python3.9/site-packages/anndata/_io/h5ad.py", line 426, in read_h5ad d[k] = read_attribute(f[k]) File "/usr/local/tools/_conda/envs/mulled-v1-f1891eb62511084a03c08c08b82a6b785abdf465b18ed9ac4a36c19217d84a96/lib/python3.9/functools.py", line 877, in wrapper return dispatch(args[0].__class__)(*args, **kw) File "/usr/local/tools/_conda/envs/mulled-v1-f1891eb62511084a03c08c08b82a6b785abdf465b18ed9ac4a36c19217d84a96/lib/python3.9/site-packages/anndata/_io/utils.py", line 162, in func_wrapper raise AnnDataReadError( anndata._io.utils.AnnDataReadError: Above error raised while reading key '/layers' of type <class 'h5py._hl.group.Group'> from /.

That was for extracting observations but variables and general information also failed. Here is the history if you need it. I'm pinging you just in case you have any idea how to fix SCEasy again to avoid this issue.

mtekman commented 5 months ago

@wee-snufkin I spoke with @nomadscientist and the error seems to only persist on the singlecell portal because the AnnData tools there are a bit outdated (v0.7.5)

Screenshot 2023-12-18 at 10-46-02 Galaxy

I ran the same Inspect tool on usegalaxy.eu but with the modern update from @pavanvidem and everything just worked for that same dataset (v0.10.3):

Screenshot 2023-12-18 at 10-45-41 Galaxy

So I guess we just need to update the tools on singlecell.usegalaxy.eu

mtekman commented 5 months ago

Oh, I think Pavan already updated the tools!

@wee-snufkin You just need to use a newer version of the inspect tool (v0.10.x)

pavanvidem commented 5 months ago

There is an incompatibility of anndata between versions before 0.8 and after 0.8. Please use either everything before v0.8 or everything after v0.8.

wee-snufkin commented 5 months ago

There is an incompatibility of anndata between versions before 0.8 and after 0.8. Please use either everything before v0.8 or everything after v0.8.

Ok, thanks! How about AnnData Operations and Scanpy FilterCells tools? They both keep on failing on the converted object, no matter the version...

mtekman commented 5 months ago

There is an incompatibility of anndata between versions before 0.8 and after 0.8. Please use either everything before v0.8 or everything after v0.8.

Ok, thanks! How about AnnData Operations and Scanpy FilterCells tools? They both keep on failing on the converted object, no matter the version...

Filter seems to work for me

image

AnnData Operations I couldn't test because I don't know what it does :grin :

wee-snufkin commented 5 months ago

Filter seems to work for me

AnnData Operations I couldn't test because I don't know what it does :grin :

I actually meant this Scanpy FilterCells tool which I used as an alternative to AnnData Operations to flag mito genes (I wanted to use AnnData Operations in the first place to change filed names in var and flag mito genes)

pavanvidem commented 4 months ago

Filter seems to work for me

AnnData Operations I couldn't test because I don't know what it does :grin :

I actually meant this Scanpy FilterCells tool which I used as an alternative to AnnData Operations to flag mito genes (I wanted to use AnnData Operations in the first place to change filed names in var and flag mito genes)

@wee-snufkin Do you have a history with the failing tools? I guess the problem is that the EBI tools still use the old version of Scanpy. Maybe an old scanpy tool somehow worked on the latest version of anndata and produced an older version of anndata. I know it It's confusing but it is just a guess. Updating every scanpy and anndata based tools to the latest versions hopefully fixes the issue.

wee-snufkin commented 4 months ago

@wee-snufkin Do you have a history with the failing tools? I guess the problem is that the EBI tools still use the old version of Scanpy. Maybe an old scanpy tool somehow worked on the latest version of anndata and produced an older version of anndata. I know it It's confusing but it is just a guess. Updating every scanpy and anndata based tools to the latest versions hopefully fixes the issue.

Here is the history. If I'm not mistaken, the newest version of AnnData Operations uses scanpy 1.1.6, but feel free to check the error in the history

pavanvidem commented 4 months ago

The latest SCEasy converter converts into the latest version of anndata object. Then it is not possible to use it with any of the EBI SC tools because they support older versions of the anndata. The easiest solution is to use the SCEasy convert from EBI tools that generate anndata compatible with the EBI single-cell tools.

wee-snufkin commented 4 months ago

The latest SCEasy converter converts into the latest version of anndata object. Then it is not possible to use it with any of the EBI SC tools because they support older versions of the anndata. The easiest solution is to use the SCEasy convert from EBI tools that generate anndata compatible with the EBI single-cell tools.

@pavanvidem got it, many thanks! Now all works perfectly and I included some explanation for the users regarding the use of the two SCEasy tools.

shiltemann commented 4 months ago

Hi @wee-snufkin, thanks for your tutorial, this looks all good from a technical point of view, so please go ahead and merge when you are happy with it @nomadscientist or @pavanvidem :)

@wee-snufkin: if you would like to write a news post announcing your new tutorials or other updates, please feel free to always include that in the PR as well (but up to you to decide if that makes sense)

pavanvidem commented 4 months ago

I still have to test this. I will try to allocate some time today.

wee-snufkin commented 4 months ago

@wee-snufkin: if you would like to write a news post announcing your new tutorials or other updates, please feel free to always include that in the PR as well (but up to you to decide if that makes sense)

Thanks @shiltemann! I've added a wee news post to this PR and updated some of the contributions to tutorials :)

wee-snufkin commented 3 months ago

Apologies for keeping updating this PR for so long, but I think I'm done! All the tests pass the tutorial part (5debf55), linting only fails for the cyoa bit.