Open nruth opened 8 years ago
:+1: Are we going ahead with this or is this already implemented ?
I didn't make a PR. I worked around the problem by putting the document into its own temporary subdirectory then using ls. I do think it's something that can be fixed, as it's just a forgot-to-think-about-the-return-value problem. But the PR backlog is growing.
related to https://github.com/documentcloud/docsplit/issues/42
After extracting the text from a PDF or Doc file I need to do something with it. I understand not loading the string into ruby (it could be huge), but it'd be helpful to get the output file path as a return value. Otherwise we have to use different output dirs or try to reconstruct its path based on other information, which feels wrong.
Currently
Docsplit::TextExtractor#extract_text
is returning the source file paths. For Transparent doc(x) file conversion it returns the intermediary tempfile pdf. E.g. when I map over an array with a pdf and a doc in my project's tmp dir I get backInstead I'd like to be given the path of the output text files, so I can open them.
Would this be a good PR, or is there a deliberate reason to return these other file paths that could be documented?