-
Plan discussed at meeting
We will use the following environments:
- stage
- ccebikes (large: 86k trips, 108 users)
- smart commute (medium, long-term: 18k trips, 21 users)
- STM community (smal…
-
The C++ af-dist implementation added in #157 (and notes in #160) is surprisingly slow. In some informal tests it looks like it's about 3X slower than Python Zarr, just retrieving the chunks. There's …
-
I want to pretrain an LLM with 2T tokens using llm-foundry. But before training, the data processing time is too long. Is there any way to accelerate it?
-
### Is your feature request related to a problem? Please describe.
When you tell the NextCloud app to AutoUpload a camera dir with >3000 images in it, the app needs to process each of the images befo…
-
### -> submitter ORCID (or name)
0000-0002-1270-4377
### -> slug
polanco-2024-deltas
### -> license
CC-BY-4.0
### -> alternative license URL
_No response_
### -> model category…
-
Some notes (in no particular order) about speeding up reading many chunks of data from cloud storage.
## General info about cloud storage buckets
In general, cloud storage buckets are highly dis…
-
**Description:**
Currently we get two string[] for some CSV row when we access through byte[] stream.
**Describe your problem(s)**
We need to access a CSV file through byte[] stream and convert…
-
It would be nice to not have matplotlib so tightly integrated into the plot.py functionality so that we can exersize tighter control over the output graphs. Doing this would also result in faster adop…
-
The recent addition of V2 DPI awareness introduced a few issues:
- [x] Toolbar (fixed)
- [x] Searchbox (fixed)
- [x] Menu Icons (fixed)
- [x] Tray Icons (fixed)
- [x] Process tree icons corrupt…
-
I am using python [dedupe ](https://github.com/dedupeio/dedupe-examples) library for large dataset. Initially we tested with 3K dataset and it could run with "AWS Sagemaker instance -> ml.r5.4xlarge 1…