Curious about generating samples and labels from existing data

pallavisurana1 commented 3 years ago

Hi,

Great work on the SCADEN paper. I was just curious if there is a way to generate my custom samples with labels as shown in the example file 'data6k500.txt' ? Is prediction only possible on the data attached here? - https://figshare.com/articles/code/Publication_Figures/8234030?file=17855789

Thanks!

KevinMenden commented 3 years ago

Hi,

of course that's possible - I'm just not quite sure I fully understand what you want to do. Do you want to simulate data and then use that for testing? Or do you want to generate new training data from a scRNA-seq dataset?

Prediction is possible on all kinds of datasets :)

Cheers, Kevin

pallavisurana1 commented 3 years ago

Hi, I was planning on use Seurat v4 PBMC data. I want to generate new training data from this dataset. Basically test out if this dataset works well with SCADEN. Is it possible to test out the method with processed h5ad training data? Thanks

KevinMenden commented 3 years ago

Okay so you want to generate new training data - of course that's possible. There's documentation about this here: https://scaden.readthedocs.io/en/latest/usage/#scrna-seq-data-processing

I always use Scanpy, but it can of course also be done with Seurat.

I'm still not sure what you want to test. If you have PBMC bulk data and just want to test Scaden, there's a dataset available that you can download, prepare with scaden process and use for training & prediction (see Documentation link). For your own PBMC scRNA-seq data you'll first have to do the pre-processing (cell type annotation etc.), and then simulate training data (with scaden simulate).

Let me know which part is unclear.

KevinMenden / scaden

Curious about generating samples and labels from existing data #83