Open gregjohnso opened 5 years ago
so this minimal example works for me
import scanpy.api as sc
import geneselection.datasets.dataset as gds
from torch.utils.data import DataLoader
adata = sc.datasets.krumsiek11()
adata.obs_names_make_unique()
gsds = gds.GSDataset(adata, with_obj_label=False)
dataloader = DataLoader(gsds, batch_size=99, shuffle=True, num_workers=4)
for i, mb in enumerate(dataloader):
print(i, mb.size(), mb.mean().item())
problems:
with_obj_label=True
returning a tuple breaks the torchvision dataloaderpotential problems / things to think about:
pushed a small change to the js_dataset branch
- keys = self.data.obsm[index]
+ keys = self.data.obs.iloc[index,:]
which gets the obs
(eg cell-wise) annotations to work. it only works with the dataloader if there's only one column in obs
though, since the torchvision dataloader can't handle pandas dataframes but can implicitly convert a single column to one of the types it can handle (tensors, lists, tuples, etc).
pr here https://github.com/AllenCellModeling/geneselection/pull/2