DS4SD / docling

Get your documents ready for gen AI
https://ds4sd.github.io/docling
MIT License
10.49k stars 507 forks source link

Analyzing PDf files is too slow #346

Closed langzichai closed 4 days ago

langzichai commented 1 week ago

Question

... I have a need right now to just get the content of the pdf, but analyzing the file is too slow 4M file took 34 seconds 63M file took more than 1 hour. Please have to improve the speed of the method? Also confirm that the GPU is used by default? I found that there is no loss of GPU in use.

PeterStaar-IBM commented 6 days ago

@langzichai If you can, please share the pdf-files, so we can test. If you can not share, please let us know what the current compute setup is (cpu/gpu specs, RAM, etc).

We will soon publish a technical report with reference timings, so you can validate your observations.