-
I would like to add custom metadata to chunks when saved to pinecone with Pipeline.from_configs.
Following the 'Custom meta data extraction ...' notebook on [this page](https://docs.unstructured.io…
-
Would be useful to have the ability to remap and write a SafeTensorWrapper into single or multiple safetensor files.
With parameters:
`parts` OR `max_file_size`
If neither is defined, write to …
-
### Is there an existing issue for the same bug?
- [X] I have checked the existing issues.
### Branch name
main
### Commit ID
last
### Other environment information
_No response_
### Actual be…
-
When loading multiple zarr files using xarray, I have noticed that it often changes the chunk sizes despite the zarrs having the same chunking when saved to disk. Often it will double the chunk size. …
dfulu updated
2 months ago
-
Inspiration: 5-levels of chunking
[https://github.com/FullStackRetrieval-com/RetrievalTutorials/blob/main/tutorials/LevelsOfTextSplitting/5_Levels_Of_Text_Splitting.ipynb](https://github.com/FullSt…
-
Hi @masoudmoghani
I have read your recent paper and got to know that u have used Action Chunking for Imitaiton learning with Orbit Surgical . It would be great if you provide some steps to do that w…
-
fastCDC https://www.usenix.org/system/files/conference/atc16/atc16-paper-xia.pdf
fastCDC are some optimizations on top of the very simple GEAR rolling hash.
old ticket about performance: https:/…
-
### Description
Currently the Inference API only runs chunking when called as part of ingesting a large document into an index with an inference field. This change would allow users to run chunking w…
-
-
I am using **unstructured-ingest** _version **0.3.0**_, using the following code:
```python
from unstructured_ingest.v2.interfaces import ProcessorConfig
from unstructured_ingest.v2.pipeline.pi…