exploring tiledb's dataloaders

BiocPy / cellarr

Store collections of experimental data based on TileDB

https://biocpy.github.io/cellarr/

MIT License

2 stars 1 forks source link

Open jkanche opened 3 months ago

jkanche commented 3 months ago

I ran into Tiledb's ML repo (https://github.com/TileDB-Inc/TileDB-ML/blob/master/tiledb/ml/readers/pytorch.py), which seems to implement a pytorch based dataloader. It looks like we can speed up our dataloaders without setting threads to 1.

Mostly exploration and then figuring out if we can adopt the same logic to our implementations.

tony-kuo commented 3 months ago

Setting context to "spawn" will remove the need to setting threads to 1.

I'm going to refactor the dataloader to reduce the number of database accesses to one per mini-batch
I'm going to move to a sample based epoch rather than multi-set. I think it makes more sense.