Store images in the *.archimate file as Base64 bytes

Phillipus commented 4 years ago

If we use images in canvases we have an ArchiveManager that takes care of saving the images as binary files and then zipping the model.archimate file and the image files into a zip archive file. This method was written 9 years ago because RAM was low and to keep the file portable.

An alternative method is to store the image data as Base64 encoded bytes as a set of Archi "features" in the *.archimate file.

The branch image-store has the code to do this. So it works and is quite fast and not too memory intensive.

The feature image data are stored in the root model node so they can be re-referenced multiple times, saving on space and memory.

Do we want to do this? What are the advantages/disadvantages?

If we go this way we would need to do:

Convert previous version archive files to the new format when loading
Adapt coArchi to handling images in a different way

jbsarrodie commented 4 years ago

I really like this.

In fact, several times I wished I had a way in jArchi to update or add some images. My use-case was then to create a canvas containing an image, and a plantuml code as the image description. My script would then call plantuml, get the image and set it in the canvas.

I think that having the image data in a "feature" would make it easier to get/set.

Phillipus commented 4 years ago

The only reasons I did it this way 9 years ago were:

To save on memory for 32-bit OS
Speed
To keep images and the model file together

(1) is not really valid now with 64-bit memory addresses and bigger RAM. (2) is no longer valid since we improved loading speeds (3) is still valid but is also fulfilled in this new format.

I like it too and it seems to work well. The back end code is way more simple now as well (no need for storing bytes and doing tricks when saving)

I'll go further with this and explore these issues:

Convert previous version archive files to the new format when loading

Adapt coArchi to handling images in a different way

Phillipus commented 4 years ago

Here's a really cool thing as well. The image bytes are stored like this:

<feature name="imageBytes_e2d5641007451cb8bed2a5f74c70c115279cbd5e" 
                                              value="iVBORw0KGgoAAAANSUhEUgAAA..."/>

That string added on to "imageBytes_" (name) is a SHA-1 hash of the image bytes (value). So we can avoid duplicate image bytes being added because we know from the name that it is already added.

Phillipus commented 4 years ago

Tested with several large images that took a total of 83Mb in file size. Quite fast loading and saving. I think as long as people don't put their entire photo album on a canvas it should work well.

Phillipus commented 4 years ago

More testing:

A 360Mb *.archimate file containing several large images takes about 2 seconds to load and about 4 seconds to save. Not bad, as this is an extreme case.

Phillipus commented 4 years ago

I've done it. Conversion from the zip format to the new format when loading a model is also implemented. It seems to work well.

Now the next concern is converting coArchi's image handling. I've got a feeling that won't be so easy. @jbsarrodie Do you think there will be backward-compatibility issues?

jbsarrodie commented 4 years ago

Now the next concern is converting coArchi's image handling. I've got a feeling that won't be so easy. @jbsarrodie Do you think there will be backward-compatibility issues?

Thinking out loud:

Current code shoud be updated to be failsafe (it would be good to have a version of coArchi which works with both versions of Archi, the one with historic image handling and the one with the new approach)
coArchi is already able to store features, so it should be able to store images...
but There might be some side effect on Git with such big strings...
so maybe we should extract those features in self contained files under images folder when exporting, and add them back to model element as feature when loading

Phillipus commented 4 years ago

but There might be some side effect on Git with such big strings...

Doesn't seem to cause any problems. Git regards it as a binary file because of the long lines.

I've committed a first go at this in branch image-store in the coArchi git repo (https://github.com/archimatetool/archi-modelrepository-plugin/commits/image-store).

This is what I've done so far:

GraficoModelExporter does not save images to the images folder - an easy win.
GraficoModelImporter#loadImages() will check the images folder and if there are images present converts them to the new format

The only issue is merge - one has to choose "Theirs" if the model file has the images as features.

It would be nice to store in grafico format as "features" but maybe for backward-compat we should try and do this:

so maybe we should extract those features in self contained files under images folder when exporting, and add them back to model element as feature when loading

Phillipus commented 4 years ago

@jbsarrodie This is turning out to be extremely difficult. I can't find a way to do it that's backwards-compatible for coArchi.

There's a new branch in coArchi called image-store2. With this method, the images are saved in the old way in the "images" folder and the model features with image references are removed when exporting to grafico.

The problem is that with the new ArchiveManager we have to store image paths with a new key name like "imageBytes_123456789" and the old way was a path like "images/1234".

At this point I think my head is confused as to what's going on. ;-)

Phillipus commented 4 years ago

I think I've come up with a strategy that works with coArchi.

Use the "images/" prefix on image path names and file suffix. For example:

<feature name="images/e2d5641007451cb8bed2a5f74c70c115279cbd5e.png" 
                                          value="iVBORw0KGgoAAAANSUhEUgAAA..."/>

When exporting to grafico format in coArchi, remove any features that start with "images/" from the root model. Save image data to image files in the "images" sub-folder.
Importing from grafico stays the same

Phillipus commented 4 years ago

There's a new branch in coArchi called image-store2.

Main branch in coArchi is now image-store.

Phillipus commented 4 years ago

I've tested the coArchi changes and it all works in Archi 4.6 and Archi 4.7.

Phillipus commented 4 years ago

More testing and it seems to work nicely. The code base is simpler and clearer.

However - a model file saved in this new format will open in Archi 4.6 but the images will not show. I've changed the ModelVersion to "4.7" to warn users when opening a model. If they do, the features containing the image data will be preserved it's just that the Archi 4.6 user won't see them.

Ridderby commented 4 years ago

I really like this.

In fact, several times I wished I had a way in jArchi to update or add some images. My use-case was then to create a canvas containing an image, and a plantuml code as the image description. My script would then call plantuml, get the image and set it in the canvas.

I think that having the image data in a "feature" would make it easier to get/set.

@jbsarrodie would you mind sharing those scripts? I as well have a case where Archi and platuml need to be mixed for proper documentation.

Phillipus commented 4 years ago

Now that it works we should decide if it is actually a good thing.

There is a school of thought (and this included me when I first write the original code) that such large data blobs should not be stored in an XML file but referenced as an external file (as we do now) - see https://stackoverflow.com/questions/5232445/storing-image-in-xml

But now I'm not sure whether it's a good or bad thing.

Phillipus commented 4 years ago

On the other hand, HTML pages have embedded base64 image data...

jbsarrodie commented 4 years ago

On the other hand, HTML pages have embedded base64 image data...

But this is not a good practice as some benchmarks show that it is 5 times slower to load resources in base64 data-uri.

Now that it works we should decide if it is actually a good thing.

Good question. As we often say, if it ain't broken, don't fix it, so do we really need this? The question are:

Does this solve bugs or limit potentials new bugs ?
Does this improve performance ?
Does this simplify other new features ? E.g. does this simplify extending jArchi API to add, update, remove images and use them in canvas, or is it similar ?
Does this impact users ? At least a bit IMHO as this means upgrading the model version number which is very annoying with coArchi as people who simply open a model (or switch to an old branch) will potentially change the version and thus will be asked to commit their changes while they think there is none.

Depending on your answers to the first questions, maybe I would suggest to keep this for a latter version which will contains other changes in model structure. Maybe for Archi 5.0 in which Properties would be named Attributes, thus some major changes to jArchi (and maybe coArchi too)

Phillipus commented 4 years ago

Does this solve bugs or limit potentials new bugs ?

In a way, yes. Loading images is cleaner and less prone to error. Internally, the code is a great improvement.

Does this improve performance ?

I think it's about the same in terms of speed. But internally, saving and loading is simpler - just an xml file. As we have it now we have to create a temp file, a zip file stream, and write that and image bytes to files. It works OK in both cases.

Does this simplify other new features?

I've not thought about this in terms of jArchi, so I don't know. BTW - canvas support in jArchi is quite a big thing to implement, so not done yet.

Does this impact users ?

Model version number yes. Old archive zip file format is automatically converted.

I guess it boils down to archive format vs. single xml file format.

jbsarrodie commented 4 years ago

I guess it boils down to archive format vs. single xml file format.

Crazy idea: wouldn't it be possible to leverage EMF and see the model as a resource set which contains a primary resource (the model) which references the features from other resources from the same resource set (one resource per image). So we get the best of both worlds: each image have its own file (easier to manage for git), but we still use Archi's feature to store them and attache them to the model.

Of course this means some more work on coArchi, but this would be a very generic way to handle binary attachements (each attachement is base64 encoded but sits in its own resource file.

BTW, having a kind of generic notion of "attachement" in Archi could make sens for non image too. I once imagine to store some jArchi scripts inside features, and then have a "local" jArchi script that would simply load those feature's scripts. This would make it easy for people to share scripts with their models.

Phillipus commented 4 years ago

Crazy idea...

Would need to investigate this. Good idea, though. ;-)

archimatetool / archi

Store images in the *.archimate file as Base64 bytes #597