documentcloud / docsplit

Break Apart Documents into Images, Text, Pages and PDFs
http://documentcloud.github.io/docsplit/
Other
832 stars 214 forks source link

Fix deprecated method File.exists? to File.exist? #159

Closed tuttiq closed 1 month ago

tuttiq commented 1 year ago

Fix for this issue: https://github.com/documentcloud/docsplit/issues/158

Fixes compatibility with ruby 3.2.

tuttiq commented 1 year ago

@knowtheory @jashkenas Any chance we get this merged?

tmaier commented 1 year ago

Hi, I just ran into the same issue when trying to upgrade to Ruby 3.2. It would be great, if this could get merged. It should also not break compatibility to < 3.2-versions.

tsotne-m commented 7 months ago

@tuttiq any news about this topic? I just encountered this issue. Maybe you used alternative for gem and could you tell me which one?

tuttiq commented 7 months ago

@tsotne-m (cc @tmaier) I ended up pointing the source for the gem (on my project's Gemfile) to my forked version: https://github.com/tuttiq/docsplit

Not great, but I figured this repository is no longer being maintained 🤷‍♀️ I don't plan on maintaining my fork either (since I'm not working on that project anymore), so I recommend you maintain your own forks if you need this gem long term.

tsotne-m commented 7 months ago

@tuttiq Thanks a lot for response

krystof-k commented 7 months ago

I'm just working on getting rid of Docsplit as well and it depends on your usecase, but in my case of using it to extract word processing documents, it looks like switching to something like libreconv (or just LibreOffice directly) to convert the document to PDF and then use pdf-reader to extract the text is the way to go.

tmaier commented 7 months ago

I consider to use Apache Tika in the future. Especially, a tika microservice.

It has a simple REST API to extract text. See https://cwiki.apache.org/confluence/display/TIKA/TikaServer#TikaServer-GettheTextofaDocument

abratashov commented 1 month ago

Why this PR isn't merged? This is a common issue for all +Ruby 3.2 projects... CC: @jashkenas @anujaware

abratashov commented 1 month ago

:heart: