About single vs multiple weights format in one file

bioimage-io / spec-bioimage-io

Specification for the bioimage.io model description file.

https://bioimage-io.github.io/spec-bioimage-io/

MIT License

18 stars 17 forks source link

About single vs multiple weights format in one file #33

Closed oeway closed 4 years ago

oeway commented 4 years ago

Let's continue the discussion about the single vs multiple mixed weights_format in a single file (while still allow multiple weights trained differently in the same file).

During the meeting today, we have discussed whether we should allow mixing multiple weights format in a single file. The main argument (ilastik team) for going for a single weights format is about difficulty it might involve for the implementation and for the user to switch between different format.

I would like to list a few arguments that against this here for further discussion:

for the website, we would like to display one model card but with different consumer icons. This would mean a single model file should be able to read by multiple consumers. We can potentially group multiple model file into one card by merging fields, but it's definitely not straight forward as a single file contain multiple weights formats. I would not go for the merging or grouping method for now as it introduce a lot of complexity for the rendering of website right now.
for the potential issue raised about accepting external contribution to the single model file. I think it makes a lot of sense to merge the same model weights in different format into a single file. And it's also practical to implement because we can simply verify with the same test_inputs and expect the same(with tolerance) test_outputs as other formats. This can be done by us, potentially automatically with model runner in CI. While in the other case when you have models trained differently or different dataset, you will have real issue on trusting the model because there is no practical way to validate the model.
it feels more natural to have one model with weights in different formats that produce the same outputs in a single file, much more natural than having weights trained differently or on different datasets in the same file. Despite all the technical details, the definition of a model is basically an algorithm take certain inputs and transform it in a specific way to get the outputs. The same model in different file formats won't change that definition, and the file formats it's just a set of options for the user to use the same model.
there is little or no need for a consumer software being able to read multiple weights formats for the exact same model weights (produce the same results). It is not clear to me why, for example, ilastik or user of ilastik wants to switch between weights formats stored in the same model file. I would simply support a preferred weights format for a software and filter the weights list to read it. If the preferred weights file doesn't exist, then fallback to other weights format. Furthermore, I think this implementation difficultly appeared right now might be solved in the near future. @carlosuc3m mentioned me there is a java library can load both pytorch and tensorflow formats, and I guess that can easily switch between, ImJoy can switch between different runtime. I mean the worst case is to ask the user to restart the software in order to load different runtime libraries.
it seems we should give more priority for the support of a model with multiple weights format, rather than supporting a model file that supports multiple snapshots or trained on different datasets. We have already decided that we want the model zoo file to be interoperable between consumers, while there will be fewer cases that one want to store different snapshot of a single model. Why would one want to share a model file with one weights trained for 1000 epochs and another weights trained for 1010 epochs? You simply select the one with better performance to rather than leave the choice to the user. As for the model trained on different datasets, it is a case that makes more sense to store into different model file compared to the same weights in different formats.

To be clear, I think we should allow storing weights trained differently or on different dataset in one single file, in addition, we should definitely allow weights in different formats in a single file with different weights_format key in the weights list.

I think this need to be decided soon as for the website, and this will have big impact on how the website render models.

I am of course open to hear further arguments that suggesting we should go for a single weight format. @akreshuk @FynnBe @m-novikov @carlosuc3m @fjug @frauzufall

fjug commented 4 years ago

Ok, I originally decided to shut up and live with whatever you guys like. (Had the feeling you all agree.) Now that I see that you are all not in agreement, I'm afraid I have to weigh in with my potentially unpopular opinion.

I am not a fan of multiple weights in one model. There should be one set of weights and they should be the ones that the code stored in that model produces with the parameters set in the json file. Easy! A model then is ONE model, not a class of trained models. I am of the firm opinion that this is the right choice when thinking about what a model is.

What would that then mean downstream? It would mean that for multiple training runs, different number of epochs, different datasets, etc. etc., we would create many models. Yes! And that is good! Who cares of a few kilobytes of redundant source code. It is still consistent and its clear what the weights are, because they are the ones that fall out of that model. Easy! Explaining what a model is will take minutes, not hours, and people can just get it and there is no confusion because the design is not confusing!

Now... I'm aware we also have to think website and UI. And yes, once one does, we would like to change things so that the logic for the UI becomes simpler. But if I have learned one thing in many years of SW development: don't compromise the right model decision so UI code becomes 'simpler'. It might sound right at first, but its a lie and it will backfire!

Hence, in order to group models, let's instead come up with a sensible grouping scheme of equivalent models, or model variations that we would like to see grouped. How hard can that be? I'm positive the outcome will look just the same for the users, but the stored models are MUCH cleaner, and that is for me of uttermost importance. Models are 'user' facing, the UI is something we have to create once and will not be seen by many people ever!

Ok, I reached the end of my 'closing arguments', now its on you to ignore me and come up with an alternative that likely also acceptable. I will be happy to go along with your joint decision... 😄

Greetings from the new risk area Dresden, Florian

m-novikov commented 4 years ago

I fully agree with @fjug argument.

Also keeping models separate will allow us to update our "same model" definition to group models consistently in the future without relying too much on individual contributors in making these kinds of decisions.

I think we are trying too hard to stretch model.yml between runner and website use cases. It's better in this case to have independent domain representation for the website and for the runner.

oeway commented 4 years ago

I felt we are going back and forth, but it's great that more people are involved in the discussion.

To clarify, my points listed above (including the UI part) is about supporting multiple weights under the assumption that we support already multiple weights in a single file. So I would consider the following points about whether we should even support multiple weights in a single file format is a fork of this discussion, but let's follow that anyway.

We have started by supporting only one weight in one file, and later move to multiple weights in a model since a discussion in March. There we started to think about multiple weights in a single model file because we may need different weights for prediction and training. For tensorflow for example, there is a so called frozen model format (pb) that are much suited for inference(smaller size, training ops removed) but not suited for training, while for training, there is another saved model format with all the information. It makes a lot of sense to store this two in the same model file since they represent exactly the same model, but can be used for different purpose.

At that time, we have very simple fields (source and hash) for each weights. Later we started to also think about more practical issues to differentiate model architecture and the actual weights. The authors of the model architecture are often different from the ones who trained the model and generate the weights, so we decided to expand the fields in each weights, to allow independent attributions to the model authors and the weights authors. And this is a real issue that we still need to address even if we go for the single weights format.

Don't get me wrong, my point about simplify UI is based on the fact we are already supporting multiple weights but not allowing weights in different formats stored in the same file.

I would be completely fine with supporting a single weights in one file, but my point is about that we should support multiple formats for the same weights to enable model interoperability, for example, the same keras model and weights can be converted to tensorflow saved model, frozen model pb file, onnx format, tensorflow.js format, and pytorch formats. In addition to that, one model can be converted in order to support multiple version of Tensorflow and Pytorch etc.. This will greatly improve the chance of achieving one model loaded by many consumers, which is the major goal of this model zoo project.

Supporting this won't have any impact on the consumer software implement, for a giving consumer software, it just need to filter based on the supported/preferred weights formats.

@fjug from what you described, you seems more concerned about the fact that models trained differently ( for multiple training runs, different number of epochs, different datasets, etc. etc.,) can be stored in the same file. Could you clarify what's your opinion on the same weights but in different formats?

Otherwise, figuring out a grouping mechanism for the exact same weights in different format won't make much sense, I mean they literally should stay in the same file.

Any idea for grouping the same model trained differently? The easiest would be assigning an id for a certain model architecture. Any other idea?

FynnBe commented 4 years ago

I see "multiple weights of the same format in one model.yaml" as a compromise between "single weight in model.yaml; let's have millions of them!" and "model.yaml allows them all, what exactly was the manifest.yaml for again?".

It adds relatively little complexity to the runner (one has to select one of the weights, but they are all of the same format, thus 'technically equal'.
it reduces the number of model.yaml files anyone has to deal with

For UI we have to figure out some more grouping mechanisms anyhow. Even if model.yaml files would contain different weight formats there could be several of the same model in different repositories, etc... How to associate them on the website is a separate problem. However grouping models by all keys except weights and weights_format should make it easy to identify models with the same architecture and the same data use case.

frauzufall commented 4 years ago

@oeway the "different weights for training and prediction" argument is really important, I agree - but from my perspective that's also touching the model architecture. The training model might have e.g. more inputs (training parameters) and outputs (e.g. loss), at least that's the case for the TensorFlow training vs. (frozen) prediction models I am dealing with. Therefore I think the training model would have to be described differently, no? Currently, the model specification does not really cover training?

I also don't think figuring out how to attribute both the authors of the model architecture and the authors of the training is solved by having multiple weights, but now i read in the README here that it says:

As some weights contain the model architecture. The source is optional (depending on weights_format)

If the weights should also include the training model, then I think the inputs and outputs also need to move into each weight..

For grouping, we could have something like a model_architecture_id (e.g. N2V) and a model_usecase_id, (e.g. the name of the lab doing the training plus the name of the dataset).

FynnBe commented 4 years ago

I agree that we will need more fields to enable training, additional inputs, loss, etc... It's good to keep this in mind, although it will have to wait a bit until we concluded on the format for inference purposes.

I think between model name and what you describe as model_architecture_id there is a lot of overlap. Also between weigths/name and model_usecase_id. In my opionion it would be easier to agree on a subset of existing keys to be used for grouping, rather than adding keys just for the purpose of grouping. If you'd like to group by lab we might add a lab or working group key, If model name is too specific, we might add a architecture_family key or similar.

For grouping (and filtering) in general, it would be awesome to support grouping by arbitrary specification keys. If the UI can do that, we can focus on content and backend usability in the model specification.

oeway commented 4 years ago

@frauzufall @FynnBe For the training support, it is not the main focus of the current version, and from one of our previous meeting we seems to agree on that it's up to the consumer software to store the training settings under config. For longer term, we will have to support training, since retraining/finetuning models is crucial for using models in practice, it will be very limited to only allow fixed model for inference.

If the weights should also include the training model, then I think the inputs and outputs also need to move into each weight..

Good point! It will be too complicated to handle if the inputs/outputs are different for each weights. Well, if intputs and outputs changes, I tends to think they are different models, thus should go into different file.

My best hope is that in one model file we should at least support single model, single weights, but allow porting them into different file format. Like this, we don't need to touch inputs/outputs, we can use the same test_inputs and test_outputs, same cover, same authors for the weights (since the training was only done once).

For grouping, we can implement whatever we like once we settle down on a specific grouping rule. I agree with @FynnBe that we might not need extra keys. But I still see some practical issues: 1) the keys used for grouping has to be a global setting for the whole site, otherwise if we store them in the model file itself, we may get conflicted settings. 2) if multiple keys involved, we need to define the exactly matching logic and that might be problematic.

An easier solution that combines the idea of @FynnBe and @frauzufall is to encode the group name in the model name like a file path, for example UNet2d/My Awesome Model - 1. This trick is used by S3 to generate virtual folder, and also in Kipoi. And we can even support hierarchical deep grouping, we can support the flattened view but also tree view like a common file browser.

We will need to make an decision soon, please comment on these two proposals above.

FynnBe commented 4 years ago

Keys to group by could be a setting on the website. Either fixed grouping for different entry points or (additionally) a flexible entry point that allows the selection of grouping keys.

Instead of a clever naming scheme to encoding additional information I would actually prefer an additional key. The architecture key in particular might not be required (at least not urgently). tags and free text search of name and description enables anyone at the moment to find models of particular architectures. I am not against adding an architecture key in general, I simply think we have more pressing issues to agree on. Happy to discuss in a separate issue and add it in 0.3.1...

esgomezm commented 4 years ago

The training model might have e.g. more inputs (training parameters) and outputs (e.g. loss), at least that's the case for the TensorFlow training vs. (frozen) prediction models I am dealing with. Therefore I think the training model would have to be described differently, no? Currently, the model specification does not really cover training?

In a future release, could we maybe distinguish between hyperparameters and inputs for example? I think that the input could be considered as the input for the model architecture and the hyperparameters the inputs for the model training process. These hyperparameters could be defined in the training section of the yaml.

I also don't think figuring out how to attribute both the authors of the model architecture and the authors of the training is solved by having multiple weights model_usecase_id, (e.g. the name of the lab doing the training plus the name of the dataset).

If we agree with @oeway about having just one set of trained weights, then the problem is solved by using Dataset cards and the field authors.

I am not against adding an architecture key in general, I simply think we have more pressing issues to agree on.

I kind of agree with this, especially because we will end up in a new discussion about what do we consider to be the same architecture (kernel sizes, number of poolings in the encoders, different loss functions, just encoder-decoders, what about residual layers?). Also, what is the aim of grouping? Thinking about the potential user, it may make more sense to group the models by task / dataset / (non-)supervised, but at the same time, there is already a very close functionality using the tags.

If the aim is to upload cards for different checkpoints in the upcoming release, then, @oeway proposal (writing the model name as a path) is probably a good option. However, I'd keep this chance only for those models that share the same GitHub source code or source architecture. Otherwise, I think we should really think about it in a new release.

FynnBe commented 4 years ago

I am confused. This discussion went back and forth in all sorts of different directions.

We see that grouping, author crediting, and maintaining flexibility for future (training) additions are difficult problems. We discuss more and more complex scenarios and possible solutions for the model specification. This issue's discussion on single vs multiple weight formats branched out into many other areas and went back to single vs multiple weights of any format.

My conclusions as of now:

Multiple weight formats is off the discussion table for now.
We need more keys for training, however we still delay the design for a common training format. In the mean time the config field allows any party to experiment and add additional training information there.
the mode.yaml specification is a structure intended for backend use. It is part of the common API of the model runners. Some keys can be directly useful for UI, but that is merely a convenient coincidence.

One question remains: Do we go back to single weight per model.yaml? cons:

the issue of grouping will become urgent faster, as in the current UI design the number of model carts will grow faster.

pros:

We saw that we need grouping beyond the weights of the same model. Having all entries separate makes a generic grouping actually easier. One generic method of grouping that works for various keys vs pre-grouped model.yaml files that need grouping and merging.
It seems the simplest for backend; 0.3.0 is all about going back to basics to agree on the minimum to make inference work reliably and work on cross software consumer compatibility.
as of now we do not have models uploaded with multiple weights anyway

oeway commented 4 years ago

Thanks for pointing out @FynnBe it's better to focus on the decision about the single vs multiple weights format in this thread.

In addition to what @FynnBe concluded, I would like to differentiate the the single weights in one file case into two: 1) single weights in one exact format 2) single weights saved into different formats, this includes different frameworks(tensorflow, pytorch, onnx), different formats or versions of the same framework(TF1.x vs TF2.x, saved model vs frozen, keras h5 vs TF pb, pytorch .pth vs torch script etc.).

I would like to stress the importance of supporting 2), it's an ugly solution but it likely the easiest feasible to achieve cross-compatibility and allow software runtime to upgrade. For example, deepimagej 1.0 save models in TF1.x format, later on deepimagej 2.0 upgrade its runtime to TF2.x, we can just add the new format into the same model file, such that users with different deepimagej versions can work with the same model file. Similarly, it will allow us gradually add support to new consumer with a single model file. Otherwise, we just adding more and more model files for the exact same model, it causes confusion to the user, multiply the number of model files much faster, and introduce extra maintenance burden.

We will have immediate use cases for this, for example, for deepimagej ImageJ (native java) and ImageJ.JS, we need two formats. For some of the HPA models, we have tensorflow and tensorflow.js version.

m-novikov commented 4 years ago

t's an ugly solution but it likely the easiest feasible to achieve cross-compatibility and allow software runtime to upgrade. For example, deepimagej 1.0 save models in TF1.x format, later on deepimagej 2.0 upgrade its runtime to TF2.x, we can just add the new format into the same model file, such that users with different deepimagej versions can work with the same model file. Similarly, it will allow us gradually add support to new consumer with a single model file. Otherwise, we just adding more and more model files for the exact same model, it causes confusion to the user, multiply the number of model files much faster, and introduce extra maintenance burden.

@esgomezm do you plan to distribute models containing multiple weights formats?

@oeway I think we should go in this direction when it becomes a real problem with real user pains, not before in desire to future-proof our spec - simply because it significantly complicates the current implementation. I think the most possible cases of user confusion could be resolved by a website UI/UX.

We currently don't have working examples where we are converting models from one framework to another and no automatic ONNX converters. All current tooling around this area assumes that model developers do it by hand because tech is still in the early stages and there are too many edge cases to do it reliably.

FynnBe commented 4 years ago

discussion history:

single weight, single format (how we started in 2019)
multiple weights, single format (extension in May 2020)
multiple weights, multiple formats (starting point of this discussion)
it's complicated, let's go back to 1.
single weight, multiple formats (my interpretation of @oeway's comment)

Right now (I am really open to change my mind about this yet again) I would suggest to stick with 4., ehh I mean 1. for the following reasons:

we do not have convenient tools (CI) to verify that converted models are in fact logically identical
the problem of one person contributing weights, and another wanting to contribute their conversion afterwards
we will again start moving model keys into the weight key, as more and more differences between weight format a and weight format b are discovered and sharing these keys is ambiguous. -one model.yaml file that can hold the same weight in different formats, does not guarantee these weights being split up into two model.yaml files anyway. Thus we'd have to deal with both cases.
grouping logically identical model weights of different formats is a pure UI problem. If we keep things separate, group by keys and in the future can run a CI to verify logically identical (similar) weights we can group automatically in UI.

It is clear, that @oeway cannot (should not shoulder the burden to) implement the bioimage.io website and it's increasing complexity on his own. All of us should make an effort to contribute to the website and not just push logic to UI that we don't want to deal with in the backend. Let's keep model specification close to backend API and keep improving the website UI together instead!

esgomezm commented 4 years ago

@esgomezm do you plan to distribute models containing multiple weights formats? Yes. I give an example:

As @oeway said, the first evident case is the use of deepimagej in ImageJ.JS. Also, we are preparing two models trained with ZeroCostDL4Mic notebooks. In the notebooks the weights of the best performing checkpoint is stored in Keras (.h5) and we convert the entire model to tensorflow's protobuffer format (saved_model.pb + variables). Ideally, we should provide both in the same card so the user can keep training the model in ZeroCostDL4Mic (as it is really well prepared for that) or use it in DeepImageJ for inference.

I think that option 5 supports cross compatibility. However, if we had a model compatible with Ilastik and DeepImagej for example, (the same trained model in pytorch converted to tensorflow), then we would need two sources for the architecture and not only for the weights. Am I wrong?

m-novikov commented 4 years ago

In the notebooks the weights of the best performing checkpoint is stored in Keras (.h5) and we convert the entire model to tensorflow's protobuffer format (saved_model.pb + variables).

I think checpoint use-case and retraining are also relevant for us when we intiially proposed multi-weight. On the other hand, training would require separate code already, why can't it require separate weight specs?

Also a couple of questions regarding your use case in deepimagej to clarify some stuff:

Is it possible to use .h5 weights for prediciton in deepimagej?
How one would enforce that the protobuf version and h5 version reflect the same model state?
What is the usual size of the pretrained model and h5 weights?
Do you download weights separartely from the model by deepimagej?

esgomezm commented 4 years ago

I think checpoint use-case and retraining are also relevant for us when we intiially proposed multi-weight. On the other hand, training would require separate code already, why can't it require separate weight specs?

Because in this case, the training is completely described in the notebook. The kind of functionality I see here (in the future there can be more) is not to define the training but to enable the use of the model in both places (in the notebook it is also possible to make inference and optimize some parameters for the postprocessing).

Is it possible to use .h5 weights for prediciton in deepimagej?

How one would enforce that the protobuf version and h5 version reflect the same model state?

yes, here I agree. You could either see the compiled notebook or trust the partners :) If it helps, in this case, we are the ones creating it and uploading it to the same zenodo version. Of course, we can question ourselves.

What is the usual size of the pretrained model and h5 weights?

Keras weights: weights_last.hdf5 ~ 20 Mb TF model architecture: save_movel.pb ~ 1Mb TF model weights: variables ~ 60 Mb

Do you download weights separartely from the model by deepimagej?

When creating the model in python yes. The PB format we use contains the architecture (save_model.pb) and the trained weights (variables folder). However, each time we load a model in deepimagej we need to load both always.

FynnBe commented 4 years ago

I think that option 5 supports cross compatibility. However, if we had a model compatible with Ilastik and DeepImagej for example, (the same trained model in pytorch converted to tensorflow), then we would need two sources for the architecture and not only for the weights. Am I wrong?

There is always only one model source code. By converting the model we can change the model weight format (that may include several files for architecture, weights, etc.) but the source should be source code, imho, converted/generated architecture description part of the weights format. Why can't we zip together the saved_model.pb and the variables folder for a tensorflow1 format?

tomburke-rse commented 4 years ago

I support @FynnBe proposal wholeheartily. Handling multiple weight formats increases complexity and the possibility to add new weights formats later on which we can not verifiy at the moment as identical is a clear problem. Also keeping each file "immutable" is a good way to guarantee stability of the whole system. I think the use-cases for multiple weight formats are there, but rare. As the pain increases to support this, we can always add it as it is an addition to the format, not a change, making it backwards compatible.

The handling of the website/UI is a whole other matter which we can resolve more or less easily with database support.

oeway commented 4 years ago

Maybe I am not clear, but the model file with multiple weights formats is just like a zip package, you drag into a software, the the software take the file it likes.

I don't see any extra implementation complication by putting multiple weights formats in one file. Lets say, we have a model file contains the following weights field:

weights:
    - name: keras-h5-weights
      format: keras-h5
      source: https://zenodo.org/...
    - name: TF-1-weights
      format: tensorflow-1-protobuf
      source: https://zenodo.org/...
    - name: TF-JS-weights
      format: tensorflow.js
      source: https://zenodo.org/...

There will be almost zero impact on the implementation for consumer software, it doesn't mean you need to be able to read all these formats. For a given software, you simply pick one by matching the format key. If the software cannot read any of the formats, then it simply tell the user that this model file is not supported.

Take a step back, the equivalent version to achieve the above is 3 model files, they all have the format keys, and the consumer software will need to do exactly the same matching process. But it does complicated the user, because now the user will have to choose a model file based on the weights format.

Handling multiple weight formats increases complexity and the possibility to add new weights formats later on which we can not verifiy at the moment as identical is a clear problem.

Verifying the weights format is an independent problem, by merging multiple model files with different formats into one won't change the issue about verification of models. Besides that, if there is a model runner that support this weights format (and the goal is to provide a model runner for each weights formats), it can be easily verified by using the test_inputs and test_outputs in the CI before the PR is merged.

Also keeping each file "immutable" is a good way to guarantee stability of the whole system.

This is not an issue since we use git and for stability, you can always refer to a specific model file version with its commit hash. And the changes to adding more weights are incremental and suppose to be backward compatible.

In some sense, supporting an backward compatible evolving model file is a way to provide stability for the user. E.g. for the same model we provide the same model id and url, it won't depend on whether he/she use deepimagej-1 or deepimagej-2, fiji or ilastik. It's also a way to incrementally encourage model producer to support more consumer software. Without the compilation of merging multiple model file, we simply display an icon for each supported software on the model card. It's both for a simplified UI design, but more importantly for the user.

I think the use-cases for multiple weight formats are there, but rare. Well, you know the versions of deep learning libraries are moving fast, so does the ops for the model and weights. For the consumer software, we need to keep upgrading our runtime library (e.g. from tensorflow 1.0 to 2.0), the weights file are often encoded to match the exact op implementation, so it's unavoidable that we will need to provide weights file for different versions.

I have done conversions for many of my models across frameworks (e.g. pytorch to keras, then to tensorflow.js), I would say it's not something that you can done in one command line, but it's definitely not something super hard, and the tools are becoming better. The tensorflow.js convertor for example, it's very easy to use and works for most of the ops.

oeway commented 4 years ago

Imagine this, I developed a model and I would like to make it userful to users, so I decided to support Fiji, DeepImageJ, Ilastik and ImJoy.

Here are two possible way of writing the README file for the two cases:

Case 1: only one weight format in one model file

If you want to use model-x, please choose the model file depending on what software you use:

DeepImageJ 1 download from this link
DeepImageJ 2 download from this link
Fiji download form this link, and make sure you choose tensorflow 2.3.0 from the tensorflow installer. If you have tensorflow 1.15 or lower, download this file instead.
For Ilastik with tiktorch, download this link
For ImJoy click this link

Case 2: Multiple weight formats in one model file

If you want to use model-x, please download this file here, we currently support DeepImageJ 1 and 2, Fiji with tensorflow 2.3.0, 1.15 or lower, Ilastik with ticktorch, ImJoy.

tomburke-rse commented 4 years ago

Okay, I can see your point there from an end-user perspective. Then I only have 2 more points/questions and am otherwise potentially persuaded :)

If my software supports multiple formats, I still have to parse the whole list and let the user decide, just with your case 1
How can we secure that people do not mangle the weights attribute, adding multiple weights for the same format? That's a case each software has to look out for, because I don't think YAML can do that. Which would then lead back to a) 1. asking the user and b) isn't intented but people do what they can, not what they should. I think @fjug mentioned that in a meeting too and I can painfully relate from experience.

That said, I'm leaning more to single model-single weight but could totally live with the single model - multiformat style and all the pros and cons it includes.

oeway commented 4 years ago

Okay, I can see your point there from an end-user perspective. Then I only have 2 more points/questions and am otherwise potentially persuaded :)

👍 ;)

If my software supports multiple formats, I still have to parse the whole list and let the user decide, just with your case 1

I would not ask the user in the first place, because it doesn't matter which one, and they suppose to produce the exact same model. What I would do is to automatically choose it for the user, e.g. based on the current loaded runtime library (otherwise might ask user to restart the software? download a supported tensorflow?), based on which one is newer, more robust, if some format fails, then fallback to another one. In other words, this process can be entirely hidden for the user and which is great.

The easiest is to just behave like you only support one format, report error if no matched format key. This will be the same if we use the single weights format model file.

How can we secure that people do not mangle the weights attribute, adding multiple weights for the same format? That's a case each software has to look out for, because I don't think YAML can do that. Which would then lead back to a) 1. asking the user and b) isn't intented but people do what they can, not what they should. I think @fjug mentioned that in a meeting too and I can painfully relate from experience.

We should aim for setting up CI for the runners to be able to run each model weights by feeding the test_inputs file and check whether it produce the same or close result as defined in the test_outputs file. By doing this, we are sure the the same model file will contain the same model (not trained differently). And this can be done in the CI in the central repo of bioimage.io.

There is also a model yaml compilation step (with a simple python script for now) which we can verify the model file and check if they have different weights for the same format.

Now whether we should allow different weights in the future, I think we can decide in a future version and the current format already provide the possibility to explore it easily. The nice thing is that, we can support that in a much more smoother way if we already allow multiple weights format now.

m-novikov commented 4 years ago

I would not ask the user in the first place, because it doesn't matter which one, and they suppose to produce the exact same model. What I would do is to automatically choose it for the user, e.g. based on the current loaded runtime library (otherwise might ask user to restart the software? download a supported tensorflow?), based on which one is newer, more robust, if some format fails, then fallback to another one. In other words, this process can be entirely hidden for the user and which is great.

For usability sake, it would be also great to ship weights together with the model because weight download often fails. If we have multiple weights format it can result in unweildy model files.

If you want to use model-x, please choose the model file depending on what software you use:

DeepImageJ 1 download from this link
DeepImageJ 2 download from this link
Fiji download form this link, and make sure you choose tensorflow 2.3.0 from the tensorflow installer. If you have tensorflow 1.15 or lower, download this file instead.
For Ilastik with tiktorch, download this link
For ImJoy click this link

Or it would be single link with one model file containing model (e.g. tensoflow pb) that can be run by multiple runners that support this format. Multiformat wieghts would work if:

People will have expertise to convert models manually, this is rarely the case because usually there is specialiastion in either tensorflow or pytorch.
There is automatic conversion tooling which is mature and stable enough that provides a hard guarantess in reprodusability which is currently not the case.

oeway commented 4 years ago

For usability sake, it would be also great to ship weights together with the model because weight download often fails. If we have multiple weights format it can result in unweildy model files.

Could you clarify why it would often fail? And why it make a different if we make a model package download by the user vs download weights by the software, if in both cases we host files on Zenodo or Github Release? I think it will just work if we ask them to download the yaml file, and the consumer software can then pull the corresponding weights file(s). This is how pytorch-hub, tensorflow-hub and many python library with pretrained models works, they just dynamically download the corresponding weights. Fiji update sites also does this for the plugins ( and before that imagej is using jar files @fjug may also comment on why they go for that).

I can see one potential issue if the user want to use it offline, but that's rarely the case nowadays. If they do want that, we can easily compose a zip package dynamically from the bioimage.io website when they choose to download for offline use.

Otherwise I would say, the yaml file should be recommended for exchanging, it's small, human readable (without unpackaging), works nicely with version control. It also reduce the maintenance for the model producer, since they don't need to make packages for every release.

People will have expertise to convert models manually, this is rarely the case because usually there is specialiastion in either tensorflow or pytorch.

There is automatic conversion tooling which is mature and stable enough that provides a hard guarantess in reprodusability which is currently not the case.

I think this is an independent issue from the decision about placing weights with different formats in one file or multiple file. We have have exactly the same conversion and reproducibility issue for both cases.

Despite that, model conversion doesn't necessary mean you have to go from e.g. pytorch to tensorflow. Keras to tensorflow for example is about calling a different save function(example), and it's strictly reproducible since the underlying framework is the same. The ZeroCostDL4Mic and DeepImageJ will then benefit form this. The same will apply for pytorch to torchscript model. And sometimes the same framework they rename some ops and we will need to convert it.

For cross-framework tools and reproducibility, that's goal of the entire community and there is already large frameworks such as MMdnn by microsoft that does the conversion for many frameworks. And the result is not that bad from my experience and others'. You can see an example here where someone converted from Keras to Pytorch and difference is rather small (in the order of 1e-5, and later it went to the order of 1e-10 after some further fix).

FynnBe commented 4 years ago

Could you clarify why it would often fail? And why it make a different if we make a model package download by the user vs download weights by the software, if in both cases we host files on Zenodo or Github Release?

If there is one file to download it fails or it doesn't, nothing in-between. and this is step 1 in using the model. If a download fails later on, we need to handle it in the backend, report to front end, handle it in front, etc... I am fine with either solution.

I am also open to adding multiple formats in one model.yaml, but would in this case prefer to give each weights_format a specific (optional) field, e.g. there is not

weights:
  - format: pytorch
     name: weights1
     ... 
  - format: keras
     name: weights1
     ...

but instead

weights_name: weights1
pytorch_weights: ...
keras_weights: ...

This fixes the number of weights explicitly to one, while explicitly allowing specific formats. (And allows to turn it into lists later on, if we really want to).

oeway commented 4 years ago

If there is one file to download it fails or it doesn't, nothing in-between. and this is step 1 in using the model. If a download fails later on, we need to handle it in the backend, report to front end, handle it in front, etc... I am fine with either solution.

Right, the availability of the model weights url can also be verified by the CI during yaml compilation steps (which runs every two hours currently). So model yaml downloaded from bioimage.io should at least work. I mean this should not be a particular issue that we should worry about, the entire internet is depending on the availability of links.

@FynnBe Interesting proposal about the weights keys. However, I think the two are in some sense equivalent and I would prefer the first case, because we can get a list of weights without extra efforts. While in the other case, we rely on a naming pattern *_weights. For a specification, I felt it's better to have a fixed set of key names (e.g. easier to express in a json schema). We will need additional keys for each weights anyway, e.g. hash code.

Another thing is that the character set we can use for keys and values are different, for example we won't be able to use names like tensorflow-2.x-pb as a key because hyphen and dot are not allowed.

As mentioned, keeping the list format will allow us explore the case of saving snapshots and weights trained differently in one file, thus it is also more future-proof.

m-novikov commented 4 years ago

Could you clarify why it would often fail? And why it make a different if we make a model package download by the user vs download weights by the software, if in both cases we host files on Zenodo or Github Release? I think it will just work if we ask them to download the yaml file, and the consumer software can then pull the corresponding weights file(s). This is how pytorch-hub, tensorflow-hub and many python library with pretrained models works, they just dynamically download the corresponding weights. Fiji update sites also does this for the plugins ( and before that imagej is using jar files @fjug may also comment on why they go for that).

I encoutered errors from Zenodo (status 500, 404) and GitHub (mostly timeouts, also 404) when downloading model weights.
Each time model is loaded in new software it would try to download weights from some arbitrary hosting with unknown reliability characteristics. Obviusly we can cache stuff within single runner, but still, it negatively affects reliability. Weights can be deleted, hosting can be down etc.

I can see one potential issue if the user want to use it offline, but that's rarely the case nowadays. If they do want that, we can easily compose a zip package dynamically from the bioimage.io website when they choose to download for offline use.

Another potential issue is having limited bandwith and waithing for 600mb model weights to load.

I think this is an independent issue from the decision about placing weights with different formats in one file or multiple file. We have have exactly the same conversion and reproducibility issue for both cases.

My point was that if there is no reliable tecnhology providing tooling, there is no reason to support it by complicating spec now.

Initially, idea about weights was storing checkpoins. Then it was versions of one model that are "the same". Now multiweights are about supporting multiple consumers.

To be honest the ambiguity of this part of spec stresses me out because I question if I could provide a good enough implementation that solves all possible edge cases and works reliably for an end user.

oeway commented 4 years ago

I encoutered errors from Zenodo (status 500, 404) and GitHub (mostly timeouts, also 404) when downloading model weights. Each time model is loaded in new software it would try to download weights from some arbitrary hosting with unknown reliability characteristics. Obviusly we can cache stuff within single runner, but still, it negatively affects reliability.

The only difference here is download weights by user or by the software, you will get the same error for both cases if Zenodo and Github has connection issue, I don't see how making a model package can solve the server issue. On the one hand, it is the responsibility of the model provider to make sure their files are served reliably. On the other hand, most storage services (S3, Zenodo, Github, or even dropbox or google drive ) are not that bad in terms of availability. Nowadays, most people will use object store (e.g. S3), and these are rather reliable.

Weights can be deleted, hosting can be down etc.

Model packages can also be deleted, no?

Another potential issue is having limited bandwith and waithing for 600mb model weights to load.

This will be the same if you download the from the website as a model package.

I think you are putting us in a very rare case that assuming one want to move a model from one software to another software and in the meantime Github or Zenodo has connection issue.

Beside that, we can do dynamic packaging for offline use from the website when the user click the download link to generate the model package. It's just we don't need the model producer to do it manually for every release.

My point was that if there is no reliable tecnhology providing tooling, there is no reason to support it by complicating spec now. To be honest the ambiguity of this part of spec stresses me out because I question if I could provide a good enough implementation that solves all possible edge cases and works reliably for an end user.

The answer is obviously no, it's impossible getting rid of all possible edge cases, and we should understand this is the reality for engineering, and we are not doing math. Many of the stuff we are exploring are breeding edge, one wouldn't and shouldn't expect this is a bullet-proof stable product, at least we are not there yet.

And I don't agree that the way we store multiple weights format is the root cause of the complications. Even if we only support one weights format in one file, most of the problem stays the same. In fact, the multi-weights format issue depends on how you implement the support for each individual weights format. It has nothing to do with the fact that we store these formats in one file or many files.

On the other hand, the multi weights format proposal is inclusive-- it will be compatible with single weights format in one file. If one of the community partner don't want to have handle multiple weights in one file, it simply write a single one for the model it generates, and when reading these files, it simply ignore other formats.

I felt I am repeating myself, but let me try again: when you read the file with multiple weights formats, you do:

weights_list = model_yaml['weights']
my_weights = filter(lambda w: w['format']=='ilastik-format', weights_list)[0]

Cann't you see, the complication you are talking about is really just one line of code for most consumer software, but it will simplify and enable many use cases I mentioned previously (e.g. one DeepImageJ model file that can be loaded in ImageJ, ZeroCostDL4Mic, and ImageJ.JS).

m-novikov commented 4 years ago

The only difference here is download weights by user or by the software, you will get the same error for both cases if Zenodo and Github has connection issue, I don't see how making a model package can solve the server issue.

This will be the same if you download the from the website as a model package.

Instead of downloading N times for each software user download model once, so the chances that something goes wrong are N times lower.

oeway commented 4 years ago

Instead of downloading N times for each software user download model once, so the chances that something goes wrong are N times lower.

I get this, but I mean, in Fiji for example you don't download a plugin from fiji update everytime you start it, you simply download once and save it for next time.

You only download again when you want to switch software as you mentioned, in that case N=2? 3? I guess very rarely, because different software are likely need a different model weights format (thus a different model package).

m-novikov commented 4 years ago

Cann't you see, the complication you are talking about is really just one line of code for most consumer software, but it will simplify and enable many use cases I mentioned previously (e.g. one DeepImageJ model file that can be loaded in ImageJ, ZeroCostDL4Mic, and ImageJ.JS). And I don't agree that the way we store multiple weights format is the root cause of the complications. Even if we only support one weights format in one file, most of the problem stays the same. In fact, the multi-weights format issue depends on how you implement the support for each individual weights format. It has nothing to do with the fact that we store these formats in one file or many files. I felt I am repeating myself, but let me try again: when you read the file with multiple weights formats, you do:

I head you and I understand what you mean, but I don't see how this simple solution will work in real life as:

We plan to also support tensorflow models though, so it would be like filter and select a preferred format native to this model
We also need somehow a way to distinguish between checkpoins and model variations.

To repeat myself for me weights now look like a big blob of everything - no clear definition of usage - no point in having it, nor in multiformat, nor in checkpoins, nor in model varations (different scales).

oeway commented 4 years ago

I head you and I understand what you mean

Good, I hope you don't mind the way I write here, I just trying explain what I think.

To repeat myself for me weights now look like a big blob of everything - no clear definition of usage - no point in having it, nor in multiformat, nor in checkpoins, nor in model varations (different scales).

Once we decided, we can just describe it in the documentation, no? I get that you want to design a spec that not even allow the multiple weights to appear in the model file. But my point that we dont' have to go that way, we can simply describe it in the documentation that the model producer are suppose to only save models in different formats, but not checkpoints and variations. If the user stores checkpoints any way, why not let it be, and we can potentially allow it in a next version. If no one does it, then means our current version is good. If they don't do it properly, just throw error.

If you want a technical measure to really prevent it, go for the CI, test the inputs and outputs see if they matches so we are sure it's the same model weights but in different formats.

Take a step back, for example, the model source code field for example, even if we say the developer suppose to only write source code that build a model, one can insert some code to scan the users' disk and send password to the internet, or mine bitcoin isn't it? Do we have an actual way to prevent this? My point is, our spec is not that strict anyway even if we want to.

FynnBe commented 4 years ago

agreed in today's bioimage.io meeting:

allow a specified set of weight formats in on model.yaml.
we will specify the weights in the following form:

weights:
  format1:
     source: ...
     sha256: ...
     ...
  format2:
     source: ...
    ...

todos:

[x] determine format names for 'weights with pytorch source', 'tensorflow pb + variables', 'something with keras'
- suggestions: tensorflow_bioimageio_fiji_florians_format

constantinpape commented 4 years ago

We decided that we only allow multiple weights that correspond to exactly the same model, but for different formats.