Closed bitscompagnie closed 6 years ago
If they are duplicate images with different names, then it makes sense that they would require multiple files. And I don't think we would want to change that.
The images are different and are from two different sources.
Here is a sample document that I tested and it is giving duplicates images.
Thanks for your reply.
<?xml version="1.0"?>
...
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="rId8" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="media/image2.png"/>
<Relationship Id="rId7" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="media/image1.png"/>
...
</Relationships>
The document you linked has two images displayed in the file and the rels file has paths to two images. I assume two files are being written. They are both different and they have two different names. I don't see how duplicates can be happening. Am I missing something?
4 files are being written and that’s why I posted the question.
The same image is being written twice. Is there something wrong with my override of get_image_tag function?
Probably, I can't say without seeing how image_name
is defined. Or without knowing what files are being written, and which ones are duplicates of which.
Here are the two functions I am using:
# function to generate random number for insertion in filename
def random_key(length):
key = ''
for i in range(length):
key += random.choice(string.digits)
return key
# Function to generate random image names
def image_name():
"""Docstring here."""
return '{}'.format(os.path.join(IMAGE_LOCATION, random_key(4)))
Since image_name
is being called every single time the parser finds an image tag, it is generating a new file name (with four random letters). I would suggest using the md5
hash of the image content instead of generating random digits. This will remove the duplicates. However, it will do the work of hashing each time you run into the tag. If you want to avoid that, you'll need to build some sort of cache on the parser object that will take the image name and use that as the key. This will also prevent duplicates.
Thanks for your feedback.
No problem :)
Hello Pydocx community,
I am having a problem with the following function and the problem is that it is generating/saving duplicates images to a local folder for the same source (Base64 string).
How can I prevent it from generating duplicate images?
Here is the function:
Thanks for your help.