Closed SinaRanjkeshzade closed 1 month ago
Hi @SinaRanjkeshzade, such a feature set would be outside the scope of the unstructured
library.
In general it is not possible to reconstruct an original document from the document elements we extract from it. The document elements are purposely focused solely on the content of interest to downstream NLP processes.
But mostly it's just not part of the purpose and intended use of the library.
Is your feature request related to a problem? Please describe. In some use cases, we need to read files via Unstructured, process them to generate new text, and write them back. Since the input file formats can vary, having a 'write' functionality would be very helpful. Specifically, if Unstructured can use the metadata of each partition to save the text in the same format, it would enhance usability. For example, if a text is central or extracted from an image, writing it back in the same format would be beneficial.
Describe the solution you'd like I would like to have a functionality that writes the partitions back to the same file format while maintaining the original structure of the content.
Describe alternatives you've considered I don't have any alternatives for preserving the structure, but it would be feasible to implement different file writers, each supporting a specific file format for writing text.