-
### Context / Scenario
Using Qdrant to v1.8.0, saving embeddings results in an exception.
### What happened?
Using Qdrant to v1.8.0 (a fresh installation, so no collection at all), when trying to s…
-
## User story
1. As a User
2. I want to start datapipeline manually and select which files should be processed
3. So that I can processed data later instead of deciding it during upload
## Acceptance…
-
Hey, I just came back to try your package after a while (from Azure AI). I switched like this:
```cs
//memoryBuilder.WithAzureAISearchMemoryDb(configuration["Azure:AISearch:Endpoint"]!, configuratio…
-
Hello,
So I'm using WebVid-10M dataset, which is a huge video dataset with 10 million videos.
Each tar file is of 2GB in size, containing around roughly 1000 videos per tar file.
I'm using the…
-
I decided to understand how to use WebDataset for large-scale training when my data is on the cloud.
I found that it has two ways:
1. Load sample-by-sample from cloud, i.e. I just init `dataset =…
-
I implement a simple data pipeline when loading a caption dataset:
```
pipeline_wds_dataset = wds.DataPipeline(
wds.ResampledShards(url),
wds.tarfile_to_samples(),
wds.decode("pil"),
…
xipq updated
9 months ago
-
Using the cached tokenizer of seamless_streaming_unity. Set `force` to `True` to download again.
Using the cached tokenizer of seamless_streaming_unity. Set `force` to `True` to download again.
2024…
-
Datapipeline is already set up, but still can be improved by
1. Adding more/better documentation
2. Making the pipeline more modular / easier to reuse for other tasks
3. (Optional) Improve read…
-
Hi! I installed alphafold following the [non_docker option](https://github.com/kalininalab/alphafold_non_docker) using the reduced version of the databases (reduced_dbs mode), and I have this error:
…
-
This is a tracking issue to maintain a list for known to compile packages, from a recent stackage snapshot (`lts-16.27`).
```
/root/.asterius/.stack-work/install/x86_64-linux-tinfo6/9ac86446ee106c…