issues
search
IBM
/
data-prep-kit
Open source project for data preparation of LLM application builders
https://ibm.github.io/data-prep-kit/
Apache License 2.0
171
stars
111
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[Feature] Capability to specify the paths where multiple output tables will be saved
#609
cmadam
opened
1 week ago
7
[Feature] Capability to distribute during initialization to a large binary object (e.g.a table) to all the transform instances
#608
cmadam
opened
1 week ago
3
[Bug] header_cleanser intermittently failing ci/cd when building python venv
#607
daw3rd
opened
1 week ago
0
[Feature] Base spark image build is very slow and impacting ci/cd
#606
daw3rd
opened
1 week ago
3
[Bug] pdf2parquet must calculate hash and size on the file
#605
sujee
opened
1 week ago
2
Disable ci/cd spark image build when transform does not implement spark
#604
daw3rd
closed
1 week ago
0
Disable ci/cd spark image build when transform does not implement spark
#603
daw3rd
closed
1 week ago
0
pipeline transform
#602
blublinsky
opened
1 week ago
4
Add a link to the Google Colab Tips file from the root README
#601
shahrokhDaijavad
closed
1 week ago
0
Update sample-notebook.ipynb
#600
Bytes-Explorer
closed
1 week ago
0
Change some release documentation
#599
daw3rd
closed
2 weeks ago
0
doc_id and source_doc_id params in doc_chunk
#598
dolfim-ibm
closed
2 weeks ago
0
[Feature] Explore agentic workflow capabilities for the creation DPK workflows
#597
roytman
closed
2 weeks ago
1
fix: pin all docling deps for more stability
#596
dolfim-ibm
closed
2 weeks ago
0
Start triggering testing at finer granularity in the repo
#595
daw3rd
closed
1 week ago
0
Minor fix to workflow-manual-run.yml
#594
revit13
closed
2 weeks ago
0
Update root README in order to try DPK faster
#593
shahrokhDaijavad
closed
2 weeks ago
1
refactoring of data access code
#592
blublinsky
closed
2 weeks ago
0
doc_chunk updates and new parameters
#591
dolfim-ibm
closed
2 weeks ago
0
[Bug] chunking fails on PDFs with one line text
#590
sujee
closed
1 week ago
6
disable test workflow when no code files change
#589
daw3rd
closed
2 weeks ago
0
[Bug] test bug for project workflow
#588
daw3rd
closed
2 weeks ago
1
tips for running on google colab
#587
sujee
closed
2 weeks ago
7
[Feature] Enable pure python transforms in new spark runtime.
#586
daw3rd
opened
2 weeks ago
0
[Bug] possible regression on ededupe code in release dev3
#585
sujee
closed
1 week ago
9
update docling dependencies to newer versions
#584
dolfim-ibm
closed
3 weeks ago
1
[Bug] Testing Rag notebook with latest release of pdf2Parquet, eDedup and DocID
#583
touma-I
opened
3 weeks ago
6
[Bug] issues running ray transformations on Google colab
#582
sujee
opened
3 weeks ago
2
fixed paths in README
#581
sujee
closed
3 weeks ago
0
kfp enhancement with new parameters
#580
blublinsky
closed
3 weeks ago
2
[Bug] Chunking is missing some text from bullet section
#579
sujee
closed
2 weeks ago
1
[Feature] Need better documentation of fuzzy dedupe
#578
sujee
opened
3 weeks ago
0
Custom column validator for pdf2parquet
#577
dolfim-ibm
closed
3 weeks ago
0
[Bug] Ededup doesn't load in release 0.2.1.dev2 because of missing 'SnapshotUtils' in 'data_processing.data_access'
#576
sujee
closed
2 weeks ago
1
[Feature] need an example of using doc_quality plugin with installed pypi packages
#575
sujee
opened
3 weeks ago
1
[Bug] Intermittent doc_id test-src failures in ci/cd.
#574
daw3rd
opened
3 weeks ago
1
[Bug] improve performance of pdf2parquet
#573
sujee
opened
3 weeks ago
0
Getting started 2 : Added a colab notebook, updated for local data.
#572
sujee
closed
3 weeks ago
0
[Bug] test/publish-image targets are disabled for pii_redactor/ray due to OSError
#571
daw3rd
opened
4 weeks ago
0
disable publish-image rule for pii_redactor to allow merge to pass
#570
daw3rd
closed
3 weeks ago
0
[Bug] PR merge failing in pii_redactor
#569
daw3rd
closed
2 weeks ago
1
[Feature] Remove or merge older examples from examples/notebooks/archive
#568
daw3rd
opened
4 weeks ago
0
[Feature] Remove test and test-data from publish wheels.
#567
daw3rd
closed
4 weeks ago
1
Getting started instructions and code tweak
#566
sujee
closed
4 weeks ago
1
The allowed-code-languages.txt is not good formatted, so consequence unexpected result
#565
vincent-pli
opened
4 weeks ago
1
[Feature] Allow selected columns to be ignored in non-launcher tests of transforms that generate parquet files.
#564
daw3rd
opened
4 weeks ago
0
Organise examples by use cases
#563
Bytes-Explorer
closed
4 weeks ago
3
Fix docs and mkdocs documentation
#562
shivdeep-singh-ibm
closed
4 weeks ago
0
Make it easier to get started
#561
Bytes-Explorer
closed
4 weeks ago
0
add KFP_BLACK_LIST
#560
roytman
closed
4 weeks ago
0
Previous
Next