-
Thank you for your wonderful project!
Could you provide the train/test split JSON files for the MSR-VTT caption dataset? I am unable to access the following files:
• datasets/annotations_all/ms…
-
In main.py, it says "parser.add_argument('--corpus_file', type=str, default='phase2b_corpus.jsonl'". However, I cannot find phase2b_corpus.jsonl inthe files. Which file can use as this one? Thanks!
-
I'm starting to work a lot with JSONL files (JSON Lines).
I know that the idea behind a JSONL file is that each line in the file is a separate JSON "object", but that makes it very hard to read. I…
-
### Question
I looked at the eval script
```
CONV="conv_template"
CKPT_NAME="your_ckpt_name"
CKPT="checkpoints/${CKPT_NAME}"
EVAL="eval"
deepspeed moellava/eval/model_vqa_loader.py \
--m…
-
I have created a dataset using a custom feature extractor and seved the cutset to file with the extension `.jsonl.gz`.
I can load the manifest in version `1.24.1` perfectly fine. But versions `1.2…
-
Continuing discussions from #41 I think we need to specify something.
When I upload it to a browser it says it's `application/octet-stream`.
There doesn't appear to be a consensus in https://git…
-
Hey,
I was testing this out and ran into an issue. I have a field in airtable that is a formula and returns a number. It looks like it's mad it's not a string (and was not cast to a string):
…
-
I'm trying to run [process_common_crawl_dump.py](https://github.com/huggingface/datatrove/blob/main/examples/process_common_crawl_dump.py) to dedupe an 80GB megawarc I have, and the jsonl loader is ta…
-
Hi!
I seem to be running into this issue when processing a file with datatrove (on latest HEAD [2da6f22](https://github.com/huggingface/datatrove/commit/2da6f22cddf46617510144d1f5f259d806107c84)). …
-
Are the GPT4 results evaluated on a different set of `longbook_qa_eng`? The 'ground_truth' fields in [results/gpt4/preds_longbook_qa_eng.jsonl](https://github.com/OpenBMB/InfiniteBench/blob/main/resul…