OmicsML / dance

DANCE: a deep learning library and benchmark platform for single-cell analysis
https://pydance.readthedocs.io
BSD 2-Clause "Simplified" License
351 stars 35 forks source link

Some questions about general wrapper for datasets #38

Open HelloWorldLTY opened 2 years ago

HelloWorldLTY commented 2 years ago

Hi, I intend to apply this model to different datasets rather than the competition datasets, and I wonder if you have any general loading data structure to load public datasets or not. Moreover, is it possible for me to use a lighter structure comparing the jointembedding structure if I have already processed the given dataset? Thanks.

RemyLau commented 2 years ago

Hi @HelloWorldLTY, thanks for your interest in the dance package! We are currently working on some heavy refactoring to clean up several interfaces, including datasets', and make them more user-friendly, e.g., apply methods to their own datasets, and benchmark their method on datasets provided by the package. As for now, there isn't an easy way to work with custom datasets. We expect to fix this within the next month or so.

Is your primary interest in using your custom dataset on joint-embedding tasks? If so, I can make that a priority so that you can play with the models soon.

HelloWorldLTY commented 2 years ago

Thanks a lot, I am now working on JAE and since my dataset is very large, this tool is not very efficient.

RemyLau commented 2 years ago

@HelloWorldLTY Currently, most datasets are loaded from an AnnData object, which is one of the standard data objects for single-cell omics data. So long as your processed data structure can be interfaced with AnnData easily, it shouldn't be a big deal.

Could you briefly describe the type of data structure you are working with and what libraries you currently use to process them? We can also consider adding interfaces for this particular type of data structure if it is somewhat standard as well.

HelloWorldLTY commented 2 years ago

Hi, I prefer anndata object based on scanpy, and I am currently using this type of data.

RemyLau commented 2 years ago

Ok, sounds good! This should be supported natively soon. I'll keep you posted on that.

RemyLau commented 1 year ago

This is related to an ongoing refactoring task #49

gabumon0 commented 1 year ago

This is related to an ongoing refactoring task #49

yeah, I also get into trouble when I want to apply the jointembedding scmogcn model to my own GEX+ATAC data. My data is stored as annadata, and is there any tutorial that I can learn from?

htumlc commented 10 months ago

数据集在哪里下载