-
### Description:
Create a series of scripts to validate word segmentation by ensuring that each word in the target field:
- Exists in the unique word list.
- Is neither oversegmented (unnecessari…
-
Hi,
First, I want to thank you for your incredible work on CD-ViTO. I’m really excited to explore its capabilities for few-shot object detection! I’ve successfully gone through the initial setup an…
-
### Describe the bug
When I load a dataset from a number of arrow files, as in:
```
random_dataset = load_dataset(
"arrow",
data_files={split: shard_filepaths},
streaming=True,
…
lajd updated
2 months ago
-
Hi, First of all Thank you the inspirational paper.
I am currently trying to re-implement the training process, but having some troubles.
It would be great if you could give me some guides...
fir…
-
Hello team
I would like to split my dataset by condition and annotate the cell types using SingleR. Is there a way to visualize or compare the differences in cell type annotations between the two con…
-
https://github.com/casper-hansen/AutoAWQ/blob/79547665bdb27768a9b392ef375776b020acbf0c/awq/utils/calib_data.py#L59
why do you concatenate all samples and split according to max sequence length rath…
-
When trying to add metadata to an index, either using a list of metadata dicts or a mapping of uid to metadata dict (shown below), it always produces a key error.
Example:
```
RAG = RAGMultiMod…
-
### Is your feature request related to a problem?
The CDAT codebase supports XML files to open up multiple time series datasets. The `cdat-migration-fy24` branch uses `xc.open_dataset()` for single…
-
**Is your feature request related to a problem? Please describe.**
I have a dataset with uncommon words that I cannot expect Whisper or any ASR model to be able transcribe accurately. The dataset is …
-
Could you please upload datasets/mean_emb_split.pickle and datasets/std_emb_split.pickle file?