bioimage-io / bioimage.io

Website for the BioImage Model zoo -- a model zoo for bioimage analysis.
https://bioimage.io
MIT License
54 stars 19 forks source link

The different tabs for Models, Applications, Notebooks are confusing #56

Closed constantinpape closed 10 months ago

constantinpape commented 4 years ago

For users the meaning of these tabs is not obvious: we think that the "Applications" tab is not really necessary; users will not be interested in the applications themselves. We are not so sure about the distinction between "Models" and "Notebooks". Currently it makes sense to have this distinction, because the notebooks don't adhere to our model.yaml config, but for a user it's not clear why we need this distinction.

oeway commented 4 years ago

The "Applications" tab is basically a tab grouping all the consumer/producer software for modes (e.g. software such as Fiji, Ilastik, ImJoy, plugins such as deepimagej, HPA-Classification plugin, etc. ). It can be seen as a gallery of "BioImage.IO compatible software" where users can browse and choose the one they want to use. An handy feature would be, when one click the Ilastik application card, there will be, for example a badge models: 54 suggesting there are 54 models that can be used by Ilastik, when clicked, a popup dialog lists all the supported model for Ilastik. In this case it basically serve as "Select models by application".

For the planned online inference services, we will add BioEngine Apps that can basically search for models based on users data, in this case, it is more natural to enter the search app via the "Applications" tab rather than a specific model.

Not only for models, but also for datasets, if the dataset is served with Zarr remotely, or available through IDR, there will be applications that can browse those remote data and visualize it with Kaibu for example. Different BioEngine Apps can talk to each other, so one can for example add a button to Kaibu that will call Ilastik running on MyBinder to make prediction. Similarily, we can build IDR image explorer, Image Annotator etc that connect to models and datasets. In these cases, it make more sense to access these applications directly.

Regarding the distinction between "Models" and "Notebooks":

  1. While Models is more about pretrained models with weights, a notebook sometimes is more about the code and documentation that do the training (does not necessary contain pretrained weights). E.g.: for ZeroCostDL4Mic, as I understand, they would like to provide only the notebook, but not/less the pretrained weights (because of potential risk of applying trained networks).
  2. For weights that produced by a certain notebook, it is natural to store both the model and the notebook, and link the two.
  3. A notebook does not necessary associated with one model, but can be many models (e.g. one can do a for loop to search models available on Bioimage.IO in the notebook). It may also explain how to perform pre/post processing, download dataset for training models.

The motivation for having notebooks in Bioimage.io is that many deep learning libraries/models provide notebooks, but there is no/few centralised place that collect these notebooks. Similar to datasets that associated with models, we can link the notebooks with models (e.g. for reproducibility). Under the general context of bioimaging, we can also host educational purposed notebooks that does not explicitly linked to a model.

Overall, the current design is expanding the scope of BioImage.IO a bit, make it not too restricted to models, but more about the entire bioimaging eco-system. I think we keep it open and should not bother having too many notebooks or applications listed. The users are going to use tags and search anyway to find content. Also we will apply badges or sort items by popularity, quality etc. Technically, we will have no stress to link more content, since it's very cheap to link an additional item and we do not run our own server. Would be interesting to hear what you think.

constantinpape commented 4 years ago

Regarding the distinction between "Models" and "Notebooks":

I agree, this distinction makes sense. Maybe we can help to make it more obvious to users, see below.

Overall, the current design is expanding the scope of BioImage.IO a bit, make it not too restricted to models, but more about the entire bioimaging eco-system.

I think we need to be careful here. The whole "bioimaging eco-system" is way too large in scope. We should stay focused on deep learning related tools and providing models for these tools. (To be a bit cheesy: It's important to do one thing good instead of doing ten things mediocre. ;)) I agree however that notebooks and ZeroCost4DL is a very good fit for what we want to do, because it's how a lot of people train and interact with models.

I am not so sure about putting the "Applications" in such a prominent position; it's unclear what their connection to community partners (which in most cases support some form of application) is. And it opens the possibility to add things that are out of scope and not connected to deep learning.

Of course we should still have applications that show models in a "web preview" mode; but in the context of bioimage.io users they should rather be implementation details, see the "Play button" suggestion in #57.

Technically, we will have no stress to link more content, since it's very cheap to link an additional item and we do not run our own server. Would be interesting to hear what you think.

I think one of the takeaways of the feedback that @vzinche got is that in the current form the website is over-whelming for new users or initial contributors. The reason for this is that there are too many different terms and icons. For example as a user coming from ilastik or Fiji I would assume to find those in "Applications" and the icons displayed on the models should correspond to ilastik, Fiji, etc.

From my point of view there are two (orthogonal) strategies to make the website more intuitive:

vzinche commented 4 years ago

Disclaimer: My feedback is fully based on the users' perspective and is ignorant of any technical details.

Models vs Notebooks

I feel like the naming is extremely confusing, because in the end from my user perspective both are just models that I can use. So the main difference here is whether you run a model from some GUI software or from a notebook. I think this should be reflected in the naming. Because right now it feels like Models contain all the models that I can run and Notebooks could contain anything from courses given by someone to any bioimage analysis pipelines. Additionally, if notebooks support both pretrained and non-pretrained models, there also should be a tag to distinguish these two categories, since for me as a user it is extremely important to know it in advance.

Applications Tab

As a user that came to the website to fetch some model, I either don't have any software preference at all, and then I just browse through all the models, or I have a specific software in mind, and then I would expect being able to click on a logo (for example, it is really intuitive to click on logos of community partners), and that should give me the same model tab, but filtered by software. That is why the tab 'Software' is slightly confusing to me: does it list software that I can download from this website? Or run directly on this website? Compatible with the models from this website?

I agree that there should be a list of compatible software somewhere, but I would strongly advocate for

For the planned online inference services, we will add BioEngine Apps that can basically search for models based on users data, in this case, it is more natural to enter the search app via the "Applications" tab rather than a specific model.

Sorry, I didn't really get this part. What do you mean by 'based on users data'? Why is it not possible in the models tab?

Overall, the current design is expanding the scope of BioImage.IO a bit, make it not too restricted to models, but more about the entire bioimaging eco-system.

I guess I was not fully aware of the expected scope of the website. Are you planning to extend it to any bioimage analysis software/data? In this case you should probably consider partitioning it into some 'subpages', otherwise in my experience such websites/databases get extremely hard to navigate.

constantinpape commented 4 years ago

I guess I was not fully aware of the expected scope of the website. Are you planning to extend it to any bioimage analysis software/data?

Just want to emphasize again: I don't think it's a good idea to extend the scope to any bioimage analysis software. There are several other projects that provide lists of bioimage analysis software packages already; the idea for bioimage.io is to have integration between software that can run/train/deploy deep learning for bioimage analysis and the models.

oeway commented 4 years ago

Disclaimer: My feedback is fully based on the users' perspective and is ignorant of any technical details.

Thanks! I think this is exactly the type of feedback we want to have.

Let me first clarify the different types of users that we are targeting, we can divide them roughtly into three groups:

  1. experts, developers and machine learning practitioner who produces models and tools
  2. bioimage analyst who build pipelines with existing tools/models
  3. users have little or no expertise and they just want to process their data

There will be very few users in 1 and most of them will be in 2 and 3. Since deep learning is simplifying workflows and lower the required expertise for image analysis, the user numbers can also shift from 2 to 3 in the near future. I think it's therefore more reasonable to put more efforts to serve user group 2 and 3.

An important aspect to consider is that user group 2 and 3 has little or no expertise in using the actual models, and they will focus more on their data and task. That is to say, providing only models is not enough to get their work done. This is the main motivation to expand the scope of bioimage models, to not just provide models, but also connect them with the upstream/downstream applications, notebooks, training dataset etc. and guide the users to use these resources.

Just want to emphasize again: I don't think it's a good idea to extend the scope to any bioimage analysis software. There are several other projects that provide lists of bioimage analysis software packages already; the idea for bioimage.io is to have integration between software that can run/train/deploy deep learning for bioimage analysis and the models.

This is an important discussion to have. I agree that we should not do everything, but in the meantime, I think we should not resist to include non-deep learning ingredients to the platform.

As mentioned above, most of the users we are targeting may just want a whole pipeline that can analyse their data. However, providing only models is simply not enough and they will need to navigate to another website or forum to get other pieces to complete the pipeline. To really do it well, we should also provide those utility parts and workflows from BioImage.IO and guide the users to build pipelines with models.

I wouldn't position BioImage.IO as a platform for hosting any bioimage analysis software, but I think we don't need to constrain ourselves/contributors to make the platform purely for deep learning models. We can have a wide scope and that won't stop us from focusing in AI models. Meanwhile, most newly developed software for image analysis are likely deep learning powered. In other words, the point is to better serve our targeted user groups with a focus in AI. Covering the entire ecosystem is not the goal, but if we actually did that along the way, we shouldn't resist and let it evolve.

Because right now it feels like Models contain all the models that I can run and Notebooks could contain anything from courses given by someone to any bioimage analysis pipelines.

Sorry for the confusion, but I think it is intended to think that notebooks could contain bioimage analysis pipelines. It does not necessary only about a model that user can run.

To be more on the user side, essentially, notebooks can be educational, and task oriented that give guidelines to users who want to use machine learning to solve their tasks. For example, a notebook can named "HCS image analysis with ImJoy", and in the notebook, we can use an application for image annotation, we use Tiktorch with a U-Net model for segmentation, and perform classification with another model.

Notebooks and models are really serving different purpose, and notebooks are complementary for show case different models and they can inter-linked.

Additionally, if notebooks support both pretrained and non-pretrained models, there also should be a tag to distinguish these two categories, since for me as a user it is extremely important to know it in advance.

In general, I think that's a good point that we should develop tags for users to know whether they need to train before they can use a model. However, I also see some difficulty in enforcing this type of rules here, this type of distinction cannot be detected automatically and we are delegating the administration of contributed models to the community partners. We can always set guideline for tagging, but it's up to the person who contribute/review the new models.

As a user that came to the website to fetch some model, I either don't have any software preference at all, and then I just browse through all the models, or I have a specific software in mind, and then I would expect being able to click on a logo (for example, it is really intuitive to click on logos of community partners), and that should give me the same model tab, but filtered by software.

Right, I think the main confusion is coming from the community partners that uses software logos. By definition, community partners are the entities who provide models/applications/datasets/notebooks to the platform, the logos you see actually means the teams, groups, institute who provide models. Not sure how to best solve this confusion, but the difference is that a partner can produce many applications, so the logos of community partners does not strictly mean a software. The intended way to search model by software is to go to the "applications" tab, and choose the

That is why the tab 'Software' is slightly confusing to me: does it list software that I can download from this website? Or run directly on this website? Compatible with the models from this website?

The "applications" tab contains all the software types, including downloadable software such as Fiji and Ilastik, web app such as ImJoy and Kaibu, they should be software that consumes models, or utility software that provide features to deep learning based analysis pipeline (e.g. annotation tools, visualisation tools), enhance the website (e.g.: notebook preview, model config validator, model scanner).

I agree that there should be a list of compatible software somewhere, but I would strongly advocate for

  • moving the tab itself elsewhere, since it doesn't fit in one line with the other tabs, that are all basically 'downloadables'
  • renaming it to 'Compatible Software'. Otherwise it might give an impression, that it is just a list of any software that can be used for bioimage analysis, as it is pretty common to do so.

Well, the applications are not all downloadable, it includes regular desktop software, but there is a trend to perform the analysis in the cloud with web apps, so we would like to also include web applications, for example: cellpose, nucleaizer and many ImJoy plugins (e.g. Kaibu and HPA-Classification). All these will be able to run directly in BioImage.IO, and they can be used for model preview purpose, but can also perform large scale analysis when the user connect the BioEngine to a GPU server.

Sorry, I didn't really get this part. What do you mean by 'based on users data'? Why is it not possible in the models tab?

Sorry for not being clear, with the BioEngine, one can develop BioEngineApps that can scan models for the user. Specifically, the user can upload/select images from their computer and click a button to run through all the compatible models, the results will show as a gallery of output images and ranked by selected metrics. This type of utility applications are not suitable to stay in the models tab because they are not related to a single model card, but all of them. Also it make more sense for the user to access the app directly rather than through some icons on model cards.

Are you planning to extend it to any bioimage analysis software/data? In this case you should probably consider partitioning it into some 'subpages', otherwise in my experience such websites/databases get extremely hard to navigate.

We haven't actually discussed much about the scope, as mentioned above, I think we should focus on AI models, but should not resist to include any downstream/upstream tools if relevant and useful for the users we are targeting. By targeting AI models, we already cover a wide range of new/future bioimage analysis software, actively rejecting a small portion of less relevant software won't help us keep the website clean. I would keep it open, at least at this stage, and see how it goes.

I think you are right, we should think more about the organisation of the items. The current approach is by using tags + types (model/notebook etc.). Further suggestions are welcome but I don't see a natural way to partition the items.

My general feeling is that the website is not designed for navigation one-by-one, but more for searching with tags, applications, notebooks etc.

BTW, @vzinche based on your comments, I just enhanced the "search model by software" feature by displaying a badge listing all the compatible model for a certain software (e.g. Ilastik), see a preview version here, you can for example click the "model 5" badge on the Ilastik card to see the models.

constantinpape commented 4 years ago

This is an important discussion to have. I agree that we should not do everything, but in the meantime, I think we should not resist to include non-deep learning ingredients to the platform.

As mentioned above, most of the users we are targeting may just want a whole pipeline that can analyse their data. However, providing only models is simply not enough and they will need to navigate to another website or forum to get other pieces to complete the pipeline. To really do it well, we should also provide those utility parts and workflows from BioImage.IO and guide the users to build pipelines with models.

I agree that this is a nice vision. BUT it is also a very big step from what we sent out to do initially, which is to build a model zoo. And the model zoo is also still in a very early stage.

Anyway, I think this is something to be better discussed in person. On a more general note, we should think about establishing a bit more governance procedure to discuss (or at least announce) such large changes before they are implemented. That would help lower the confusion at least among the community partners ;).

constantinpape commented 4 years ago

@oeway Ok, I am starting to really like the new website design :).

I think one part that was really confusing (at least for me) was that clicking the community partner symbols didn't work before (at least for Fiji and Ilastik), now that it's fixed it all makes much more sense.

I also think the distinctions between "Model", "Application" and "Notebook" you outline above make sense and it is logical that a community partner can provide some or all of these. I still think it is a bit non-intuitive for new users; but I think this can be improved with some UI features. Some initial ideas:

Finally, I think going forward we should really be a bit more careful with making conceptual changes unannounced and without discussion if things turn out to be controversial, this creates quite some confusion.

oeway commented 4 years ago

@oeway Ok, I am starting to really like the new website design :).

Good!

  • Add a tooltip (displayed on hover) for "Model", "Application"etc. that explain in more detail what these are.

Yes, we can do that.

  • Remove the "All" tab, I don't think it's very helpful and it would remove the clutter of items (rather minor point)

Well, for a user who just want to find something for, e.g.: nuclei segmentation, he/she may not know which tab to choose, because the matched item may be listed in models, notebooks, or applications. In such case, selecting All would allow do a global search. If we remove that, the search will restricted to the current type, and the user will need to switch back and forth.

  • Be more strict with required tags (e.g. "trainable" vs. "static"); I think although this becomes the responsibility of community partners, we could at least integrate it into the recommended CI.

Right, we can definitely implement this type of recommendation tips in the CI.

I also think that the strategy "Focus on providing models and software for bio-image analysis with deep learning, but allow related but different contributions if interest arises" makes sense.

Finally, I think going forward we should really be a bit more careful with making conceptual changes unannounced and without discussion if things turn out to be controversial, this creates quite some confusion.

I agree that we should always discuss, as we are doing now. However, I would like to also point out that the version we have right now not is not a sudden change that made overnight, it's a result of many different meetings, discussions plus some additional inspirations. Maybe I am not doing a good job in keeping everyone on the same page, but I did make changes via PR and post the update summary to gitter to ask for feedbacks. I would rather see these changes as extension to our initial goal. In the Dresden hackathon, we already discussed connecting models to dataset, allowing online inference, preview, and search models with user provided data, support community partner (github organization) etc. The only thing we did not discuss is the notebooks integration.

Any way, the good thing is that we now have the bi-weekly meeting to get synchronised and solve conflicts. (P.S.: this makes me think EAFP versus LBYL in Python).

constantinpape commented 4 years ago

First of all thanks for all the work you are doing on this project, @oeway. I think this is really moving in a good direction and my intention here is not to blame anyone, just to improve communication a bit in the future.

I agree that we should always discuss, as we are doing now. However, I would like to also point out that the version we have right now not is not a sudden change that made overnight, it's a result of many different meetings, discussions plus some additional inspirations.

Probably the whole change in scope happened a bit over time, but no one from the ilastik side was really aware of it (we discussed this today in our morning meeting and this took everyone by surprise a bit; however @akreshuk was not present and I am not sure what her status on this was).

Maybe I am not doing a good job in keeping everyone on the same page, but I did make changes via PR and post the update summary to gitter to ask for feedbacks.

Sorry for not checking this out earlier, I wasn't aware of most of this.

I can only speak for myself here, but I don't like gitter so much (it's one more communication channel among to many channels already). I try to stay up to date with the PRs though, but I think these ones just slipped my radar because they looked rather technical and it was unclear to me what they would entail in change the scope of the website.

I think one issue here is that there are so many communication channels and it is indeed very hard to sync information across all of them. I would suggest to at least try to lay out broader changes and ideas in a github issue first and trying to ping the relevant people to get their opinions. For the record, I don't think this is necessary for all small scale things, but for larger changes like expanding the scope I think it is necessary.

I would rather see these changes as extension to our initial goal. In our previous discussions already involve connecting models to dataset, allow doing online inference, preview, and search models with user provided data, community partner (github organization).

I agree that this does make sense (though we need to make sure not to broaden the scope to much).

The only thing we did not discuss is the notebooks integration. Any way, the good thing is that we now have the bi-weekly meeting to get synchronised and solve conflicts. (P.S.: this makes me think EAFP versus LBYL in Python).

I agree it's good to have the meetings to discuss this, unfortunately I can't make it tomorrow due to (one time) teaching requirements.

oeway commented 4 years ago

First of all thanks for all the work you are doing on this project, @oeway. I think this is really moving in a good direction and my intention here is not to blame anyone, just to improve communication a bit in the future.

Thanks!

I think one issue here is that there are so many communication channels and it is indeed very hard to sync information across all of them. I would suggest to at least try to lay out broader changes and ideas in a github issue first and trying to ping the relevant people to get their opinions. For the record, I don't think this is necessary for all small scale things, but for larger changes like expanding the scope I think it is necessary.

Right, I think we should improve the communication. And I agree with you that there are just too many communication channels, and it's easy to miss stuff. Meetings takes more efforts but should be more efficient to get attention.

I like the github approach, and it allows new comers to pick up the history discussion as well. For small details, I would rather take EAFP approach to get things running, and others can always step in and say no to them.

I agree it's good to have the meetings to discuss this, unfortunately I can't make it tomorrow due to (one time) teaching requirements.

Ok, we will keep records for you!

constantinpape commented 4 years ago

I like the github approach, and it allows new comers to pick up the history discussion as well. For small details, I would rather take EAFP approach to get things running, and others can always step in and say no to them.

Yes, I agree for the details EAFP is totally fine.