AllenCellModeling / geneselection

only the best genes
MIT License
0 stars 2 forks source link

data loaders for gene data #1

Open gregjohnso opened 5 years ago

donovanr commented 5 years ago

pr here https://github.com/AllenCellModeling/geneselection/pull/2

donovanr commented 5 years ago

so this minimal example works for me

import scanpy.api as sc
import geneselection.datasets.dataset as gds
from torch.utils.data import DataLoader

adata = sc.datasets.krumsiek11()
adata.obs_names_make_unique()

gsds = gds.GSDataset(adata, with_obj_label=False)
dataloader = DataLoader(gsds, batch_size=99, shuffle=True, num_workers=4)

for i, mb in enumerate(dataloader):
    print(i, mb.size(), mb.mean().item())
donovanr commented 5 years ago

problems:

potential problems / things to think about:

donovanr commented 5 years ago

pushed a small change to the js_dataset branch

-        keys = self.data.obsm[index]
+        keys = self.data.obs.iloc[index,:]

which gets the obs (eg cell-wise) annotations to work. it only works with the dataloader if there's only one column in obs though, since the torchvision dataloader can't handle pandas dataframes but can implicitly convert a single column to one of the types it can handle (tensors, lists, tuples, etc).