issues
search
IBM
/
data-prep-kit
Open source project for data preparation of LLM application builders
https://ibm.github.io/data-prep-kit/
Apache License 2.0
305
stars
134
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Update README.md
#821
ian-cho
opened
3 hours ago
0
[Feature] Add discord link on front page (README.md)
#820
sujee
opened
1 day ago
0
[Feature] Some subdirs not cleaning up venv on make clean
#819
daw3rd
opened
1 day ago
0
[Bug] make venv failing in noop/python
#818
daw3rd
closed
1 day ago
2
Fix for inability to read some parquet files (issue #816)
#817
daw3rd
opened
1 day ago
0
[Bug] parquet files with columns containing large list of byte arrays can not be read by pyarrow.
#816
daw3rd
opened
1 day ago
0
Html2Parquet Updated README and Added Sample Notebook
#815
sungeunan-ibm
opened
2 days ago
0
update to actor list limit to fix issue 803
#814
blublinsky
opened
2 days ago
1
first cut at refactoring dpk_pdf2parquet
#813
touma-I
opened
2 days ago
0
[Bug] pdf2parquet: identical PDF files have different `contents`
#812
sujee
opened
3 days ago
1
[Feature] Extend `doc_quality` to include stop words annotation
#811
Harmedox
opened
3 days ago
0
A few changes in the root README
#810
shahrokhDaijavad
closed
3 days ago
0
Restructure Html2Parquet with its own dpk_ namespace
#809
touma-I
opened
4 days ago
3
Update Kuberay api server version in requirements.env.
#808
revit13
closed
5 days ago
0
[Bug] Ray clusters are created in CI/CD with `imagePullPolicy: Always`
#807
revit13
closed
3 days ago
1
Fix set_s3_env_vars_to_component in kfp v2.
#806
revit13
closed
4 days ago
0
Update README.md
#805
Padarn
closed
3 days ago
0
Html2parquet example
#804
touma-I
opened
6 days ago
1
[Bug] Cannot run KFP pipeline for fuzzy dedup with more than 100 actors
#803
cmadam
opened
6 days ago
1
[Feature] Create a 'User Feedback' section in discussions
#802
sujee
opened
1 week ago
0
Update README docs for language transforms
#800
dolfim-ibm
opened
1 week ago
0
update doc_chunk md results
#799
dolfim-ibm
closed
1 week ago
0
use str as document_hash
#798
dolfim-ibm
closed
1 week ago
0
Crawler transform
#797
touma-I
closed
6 days ago
0
[Feature] RAG: when saving DPK processed data into vector database, optionally save it in llama-index format
#795
sujee
opened
1 week ago
0
[Bug] Error while running doc_chunk transform
#794
touma-I
closed
1 week ago
2
fix uint64 hash to pyarrow
#793
dolfim-ibm
closed
1 week ago
1
[Feature] Modify pdf2parquet to accept a parquet file with the payload in the content column
#792
touma-I
opened
1 week ago
0
Encoded data detection filter for code
#791
sapthasurendran
opened
1 week ago
0
update readme
#790
dtsuzuku-ibm
opened
1 week ago
19
add new talks to resources.md
#789
dtsuzuku-ibm
closed
2 weeks ago
0
[Feature] add an example of html2pq in the documentation
#788
sujee
opened
2 weeks ago
17
Bump certifi from 2024.6.2 to 2024.7.4 in /transforms/code/code_profiler/python
#787
dependabot[bot]
closed
1 week ago
0
DPK integration with LLamaIndex
#786
Bytes-Explorer
opened
2 weeks ago
1
Build a demo with India Govt data
#785
Bytes-Explorer
opened
2 weeks ago
0
DPK Integration with LangChain
#784
Bytes-Explorer
opened
2 weeks ago
1
Enable DPK on native windows and then add info to readme
#783
Bytes-Explorer
opened
2 weeks ago
0
Rename the "Intro" notebooks to call out specific functionality it supports (PDF to Embedings)
#782
Bytes-Explorer
opened
2 weeks ago
10
Fix License select kfp
#781
revit13
closed
2 weeks ago
0
[Bug] DPK-connector should save files with correct mime type extension
#780
sujee
closed
1 week ago
9
[Feature] improve parameters for crawl function for DPK-Connector
#779
sujee
closed
1 week ago
5
[Bug] dpk-connector silently fails if download destination directory does not exist
#778
sujee
closed
1 week ago
5
[Bug] dpk-connector doesn't crawl https://thealliance.ai/
#777
sujee
closed
1 week ago
11
Pass parameters to modules in a way familiar to Python users/developers
#776
shahrokhDaijavad
opened
2 weeks ago
2
Bump tornado from 6.4 to 6.4.1 in /transforms/code/code_profiler/python
#775
dependabot[bot]
closed
1 week ago
0
[Feature] Restructure transforms as their own modules
#774
touma-I
opened
2 weeks ago
1
Modify superpipeline params type.
#773
revit13
closed
2 weeks ago
2
[Feature] Move KFP supper pipelines to /examples/kfp
#772
roytman
closed
2 weeks ago
1
small fixes
#771
roytman
closed
2 weeks ago
1
[Bug] Python launcher error when the child process dies
#770
sujee
opened
2 weeks ago
2
Next