Open ImadSaddik opened 1 week ago
Hello @ImadSaddik , Did you find a way to extract informations from excel file ? Does it possible to convert it into html or pdf to process it ?
Hi @ViCtOr-dev13, so far docling does not support Excel files. If you want, you can use LangChain
to load the parse the Excel docs, but I don't have a lot of experience with that.
@ViCtOr-dev13 , there are multiple options available. I'm not sure about your specific use case, but you could consider using Langchain's document loaders or Llama Index's readers like DocxReader (https://docs.llamaindex.ai/en/stable/api_reference/readers/file/#llama_index.readers.file.DocxReader).
We need to leverage the openpyx library.
Indeed, it will be challenging to cover all cases but if we can have something that improves overtime that is going to be good 😊
@ImadSaddik Feel free to start with the implementation. I could also start with a simple backend and then we collaborate.
Sounds good, let's do it 👍🏻
@ImadSaddik I started something in this PR: https://github.com/DS4SD/docling/pull/334
Thank you @PeterStaar-IBM for letting me know. I have been busy with work lately, I will look into it once I get the time.
@ImadSaddik Just waiting for a review now on PR: #334 , should be in sometime next week!
FYI: @dolfim-ibm @cau-git
@PeterStaar-IBM, I will test what you did and provide feedback
Hello,
First of all, thank you for open-sourcing this fantastic project. It already offers a lot in its current state. I have a feature request: would it be possible to add support for Excel files in the near future?
I believe this would make the library even more complete. While there are some areas that could use improvement, I’m confident things will keep getting better over time. I’d love to hear your thoughts on this, and perhaps you're already considering Excel file support.
Thanks again,
SADDIK Imad