-
Hi, a quick question: does meteor-file-collection have data deduplication based on hash comparisons buit-in, or in other words, is it a content-adressable storage? Did you consider choosing a differen…
-
Currently, deduplication in the visualization workflow starts _after_ the input data has been staged and tiled. If deduplication is set to occur at any step in the workflow (staging, rasterization, an…
-
Description:
- We have scrapped creative writing data consisting of letters, articles, poems, announcements and memos.
- We need to create separate hugging face datasets for each tag.
For Tibetan poe…
-
The version of HKCanCor published on [HuggingFace](https://huggingface.co/datasets/nanyang-technological-university-singapore/hkcancor/tree/main) by NTU is different from the version offered by this l…
-
Combine erasure coding with data deduplication to simultaneously reduce the overall redundancy in data while increasing the redundancy of unique data. Deduplication also requires less network transfer…
-
To avoid the serialization cost when accessing data stored in the `PartitionStore`, we could add an LRU cache for frequently accessed data. Prime candidates could be the deduplication table and the `I…
-
The gen_blockchain_data deploys many contracts. It now has many duplication of deploy, get_block_number, blocks.insert, deployments.insert.
It would be great to extract functions to prevent code du…
-
# Issue
The current implementation of budget validation needs some refactor. On large sites, Purchase Orders with 100 items timeout while submitting.
# Bottlenecks
## Unavoidable nested-loop
P…
-
**Description**
We want all the opensource Tibetan word segmented data and save it in a standard format.
The format should be:
```
[
{
'source': 'བོད་ཀྱི་གླུ་གར་རོལ་དབྱངས་ལ་གཞི་རྩའི་ཐོག་ནས་དབྱ…
-
For finite element codes (ultimately based on T8code.jl) we would like to try incorporating t8code.
Naturally the results would be point/vertex based and not element based in most cases.
The solu…