massive-datasets Search Results

1000+ results
for massive-datasets

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

huggingface/datasets #5878

Prefetching for IterableDataset

### Feature request Add support for prefetching the next n batches through iterabledataset to reduce batch loading bottleneck in training loop. ### Motivation The primary motivation behind th…

vyeevani updated 1 week ago
6
huggingface/datasets #7047

Save Dataset as Sharded Parquet

### Feature request `to_parquet` currently saves the dataset as one massive, monolithic parquet file, rather than as several small parquet files. It should shard large datasets automatically. ##…

tom-p-reichel updated 4 months ago
2
kedro-org/kedro #1778

Re-design io.core and io.data_catalog

Spun out of https://github.com/kedro-org/kedro/issues/1691#issuecomment-1176679924... Let's collect ideas here on what current problems are with `io`. To me it feels like we've neglected it and it's r…

antonymilne updated 1 month ago
9
spinalcordtoolbox/manual-correction #90

3D Slicer basic exploration

Hi. Here is a resumé of some explorations I did on 3DSlicer platform to see if it would be worth to use as a software for massive segmentations in different datasets (instead of fsleyes for instance…

maxradx updated 4 months ago
3
KennethEnevoldsen/scandinavian-embedding-benchmark #94

reducing the size of large datasets

If it is possible to reduce the size of some datasets without changing the performance too much it would be great to ensure that the benchmark runs faster. I am especially thinking of ScaLA, Da Pol…

KennethEnevoldsen updated 9 months ago
3
morenococo/crqa #20

Working with large datasets

The package fails _**hard**_ when trying to work with large datasets (~20 million or so elements in my case). Trying to compute the entire distance matrix isn't possible without massive amounts of mem…

cmicek1 updated 4 months ago
1
knightlab-analyses/uc-severity-multiomics #1

data access

Hi, Thank you so much for sharing all the datasets of your publication: Multi-omics analyses of the ulcerative colitis gut microbiome link Bacteroides vulgatus proteases with disease severity! That …

mmpust updated 1 year ago
1
piximi/piximi #342

Access cloud-based datasets (future idea)

It would be great to add the ability to access datasets stored in the cloud (e.g., s3, OMERO, etc.) Especially for some of these massive datasets that users cannot realistically download locally first…

rsenft1 updated 2 years ago
1
RMI/resources #313

An R package for INSANELY, SUPERLATIVELY fast fuzzy joins

> zoomerjoin is an R package that empowers you to fuzzy-join massive datasets rapidly, and with little memory consumption -- https://beniaminogreen.github.io/zoomerjoin/ See also https://github.…

maurolepore updated 9 months ago
1
biocore/qurro #329

Qurro crashes with ultra high dimensional readouts

I'm trying to generate reports on some metabolomics datasets with 40K dimensions, and I'm noticing that qurro is able to load this massive datasets. Not sure if others have had experience with this, …

mortonjt updated 1 year ago
2

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for massive-datasets

1000+ results
for massive-datasets