-
### Feature request
Add support for prefetching the next n batches through iterabledataset to reduce batch loading bottleneck in training loop.
### Motivation
The primary motivation behind th…
-
### Feature request
`to_parquet` currently saves the dataset as one massive, monolithic parquet file, rather than as several small parquet files. It should shard large datasets automatically.
##…
-
Spun out of https://github.com/kedro-org/kedro/issues/1691#issuecomment-1176679924... Let's collect ideas here on what current problems are with `io`. To me it feels like we've neglected it and it's r…
-
Hi.
Here is a resumé of some explorations I did on 3DSlicer platform to see if it would be worth to use as a software for massive segmentations in different datasets (instead of fsleyes for instance…
-
If it is possible to reduce the size of some datasets without changing the performance too much it would be great to ensure that the benchmark runs faster.
I am especially thinking of ScaLA, Da Pol…
-
The package fails _**hard**_ when trying to work with large datasets (~20 million or so elements in my case). Trying to compute the entire distance matrix isn't possible without massive amounts of mem…
-
Hi,
Thank you so much for sharing all the datasets of your publication: Multi-omics analyses of the ulcerative colitis gut microbiome link Bacteroides vulgatus proteases with disease severity! That …
-
It would be great to add the ability to access datasets stored in the cloud (e.g., s3, OMERO, etc.) Especially for some of these massive datasets that users cannot realistically download locally first…
-
> zoomerjoin is an R package that empowers you to fuzzy-join massive datasets rapidly, and with little memory consumption
-- https://beniaminogreen.github.io/zoomerjoin/
See also https://github.…
-
I'm trying to generate reports on some metabolomics datasets with 40K dimensions, and I'm noticing that qurro is able to load this massive datasets. Not sure if others have had experience with this, …