bioimage-io / collection-bioimage-io

(deprecated in favor of bioimage-io/collection) RDF collection for BioImage.IO
5 stars 9 forks source link

Wrong resource name generated #320

Open oeway opened 2 years ago

oeway commented 2 years ago

Many of the deepimagej resources are generated with the collection name

For example: https://bioimage.io/#/?type=all&id=deepimagej%2F3dunetzerocostdl4mic

Screenshot 2022-03-03 at 14 40 03

FynnBe commented 2 years ago

The deepimagej collection RDF in invalid. I'll make a PR with some fixes...

FynnBe commented 2 years ago

this will fix it: https://github.com/deepimagej/models/pull/39

oeway commented 2 years ago

@FynnBe It seems the issue still exists:

https://bioimage.io/#/?type=application&id=hpa%2Fhpa-classification

We discussed this elsewhere as I remember, but I still don't understand why the collection name should have higher priority than the bioengine app name, can we fix this?

FynnBe commented 2 years ago

https://bioimage.io/#/?type=application&id=hpa%2Fhpa-classification

this will fix it: https://github.com/CellProfiling/HPA-model-zoo/pull/74

We discussed this elsewhere as I remember, but I still don't understand why the collection name should have higher priority than the bioengine app name, can we fix this?

The problem is that the resolution of imjoy apps is not (yet) part of the spec. And there are good reasons not to include it (mostly unecessary mixing of these two ways of writing metadata). This needs more discussion. We might want to consider not to resolve imjoy apps at all or find another way to connect them to the spec somehow.

What we need to ensure is that the collection RDF is interpreted and validated the same, no matter if we are in the colleciton repo (where we currently still resolve the imjoy apps) or just using the bioimageio validator (outside of the collection repo). The quick fix for now is to add the wrongly overwritten fields explicitly in the collection RDF.

The best way to resolve this long term is IMO to generate the collection RDF of a partner if that partner wants to generate it from another data format (here the imjoy apps). We can easily validate and pull the collection RDF from the partners gh-pages branch. This way validation and buildig the collection don't have to deal with the other data format (here imjoy app descriptions).

oeway commented 2 years ago

@FynnBe Could you look into this too? It's urgent since many of the linked bioengine apps are broken at the moment. For example this one: https://bioimage.io/#/?id=10.5281%2Fzenodo.5910854 (Right now it doesn't run because of this issue).

IMO, any thing will have an actual impact on the user side should get higher priority. I know it's always good to have a clean solution, but for the end user, it doesn't really matter how we fix the issue, we just need to make it work. We can always improve it with anther PR, the bottom line is to not break the website or the library that users are directly interact with.

What we need to ensure is that the collection RDF is interpreted and validated the same, no matter if we are in the colleciton repo (where we currently still resolve the imjoy apps) or just using the bioimageio validator (outside of the collection repo). The quick fix for now is to add the wrongly overwritten fields explicitly in the collection RDF.

Please do so! And this is also how I implemented it before in the collection repo.

The best way to resolve this long term is IMO to generate the collection RDF of a partner if that partner wants to generate it from another data format (here the imjoy apps).

Honestly, I think it doesn't matter for now, imjoy apps can only live in the bioimageio website anyway, it doesn't matter if a consumer or python user can resolve it or not.

FynnBe commented 2 years ago

Please do so! And this is also how I implemented it before in the collection repo.

I did already:

this will fix it: https://github.com/CellProfiling/HPA-model-zoo/pull/74

My PR has not been merged yet, though...

FynnBe commented 2 years ago

It's urgent since many of the linked bioengine apps are broken at the moment. For example this one: https://bioimage.io/#/?id=10.5281%2Fzenodo.5910854 (Right now it doesn't run because of this issue).

the problem here might be a different one... ? looking into it imjoy/HPA-Single-Cell ( https://github.com/imjoy-team/bioimage-io-models/blob/29efaa3a123d5af0003789ffbfd6b9fab7f0885e/manifest.bioimage.io.yaml#L50) is included in the bioimageio collection here: https://github.com/bioimage-io/collection-bioimage-io/blob/804029d11a4f164c45b82d88280980bbcbeedb4f/collection.json#L1411

but the website's app app button resolves to imjoy/hpa-single-cell I think you cast all ids to lower case, so that should not be the problem either... Not sure what goes wrong here, but from the collection side it seems to be all there...

oeway commented 2 years ago

The issue is that the type becomes collection: https://github.com/bioimage-io/collection-bioimage-io/blob/804029d11a4f164c45b82d88280980bbcbeedb4f/rdf.yaml#L662

I would say it's the same issue as the incorrect name.

FynnBe commented 2 years ago

this part is a problem in https://github.com/deepimagej/models as many models are missing the type: model field. I know @xion16lm is already working on updating those. There are some additional problems with the deepimagej collection. The workflow I added fails with various errors. This needs to pass: https://github.com/deepimagej/models/actions/workflows/validate_collection.yaml Then cover images should show up.

FynnBe commented 2 years ago

In fact for some correct deepimagej models the covers are visible (again?): https://bioimage.io/#/?partner=deepimagej&type=all&id=deepimagej%2Fmonuseg_digital_pathology_miccai2018

edit: no, this one has urls linked directly. it probably always worked...

FynnBe commented 2 years ago

@oeway In the mean time you can implement the sorting based on the "download_count" already. The "download_count" is now present (although it equals 1 for all resources for now. I'll update it asap) e.g. https://github.com/bioimage-io/collection-bioimage-io/blob/8a7c93c4c5629a95b74d24a99c16776f8c06a30e/collection.json#L28

oeway commented 2 years ago

this part is a problem in https://github.com/deepimagej/models as many models are missing the type: model field. I know @xion16lm is already working on updating those. There are some additional problems with the deepimagej collection. The workflow I added fails with various errors.

But if you check the yaml all the models has model type, what do you mean?

In any case, the collection CI should resolve the relative cover url by their source url, no?

The issue is we cannot wait for all the models being uploaded, it may take another one or two weeks. And the website are not suppose to break for that long time.

Can't we do a hot fix for the cover field in the collection generation?

Great addition for the download counts!

For the sorting, can you do a initial sorting already when generating it? It will take more time for me to implement on the client side. I feel like we will need it anyway on the backend in the future, when we have more items. To speed up the loading, we will likely divide the summary RDF file in to smaller files. And for that we will need to first sort on the backend too.

FynnBe commented 2 years ago

But if you check the yaml all the models has model type, what do you mean?

yes, sorry, I need to elaborate a bit: I did add it earlier to the "manifest" to fix it, but when fixing the issue with the covers I needed to be more rigorous. Now, if a collection includes an entry with rdf_source that file needs to be a valid rdf as well. Here I propose to add those missing type: model lines (WIP, one by one...): https://github.com/deepimagej/models/pull/43

In any case, the collection CI should resolve the relative cover url by their source url, no?

Yes, exactly, now the source_rdf is resolved (thus needs to be valid) ...

Can't we do a hot fix for the cover field in the collection generation?

yes, almost, I'm working on it now: https://github.com/deepimagej/models/pull/43 (I have pointed at the failing resources in the deepimagej colleciton for several weeks now!)

For the sorting, can you do a initial sorting already when generating it? It will take more time for me to implement on the client side. I feel like we will need it anyway on the backend in the future, when we have more items. To speed up the loading, we will likely divide the summary RDF file in to smaller files. And for that we will need to first sort on the backend too.

good, I'll do that (edit: done)