Open jiarongkoh opened 6 months ago
Hi @jiarongkoh - thanks for the issue! We haven't supported image extraction from HTML in the past because images in HTML are linked rather than embedded directly in the document. We'll revisit internally though and follow up.
@MthwRobinson do you plan to implement this feature anytime soon?
Is your feature request related to a problem? Please describe. I process HTML files and uses the partition_html function to do so. However, I noticed that this function is capable of extracting Tables as an elements, but not Images.
Describe the solution you'd like I would like partition_html to be able to extract Images, like how shared.PartitionParameters is able to.
Describe alternatives you've considered I have tried parsing the same HTML file into shared.PartitionParameters, but this also do not extract Images. One alternative I explored was to convert the HTML file to PDF. While this might be possible, it is not guaranteed that the conversion will still yield the same expected output.
Additional context nil