chanzuckerberg / napari-hub

Discover, install, and share napari plugins
MIT License
51 stars 18 forks source link

Filter plugins based on file extension, NOT file pattern #762

Open neuromusic opened 1 year ago

neuromusic commented 1 year ago

Note: This was discussed but not well-specified in the PRD for npe2 support.

Currently, the file extension filter displays the file pattern that plugins support instead of the intended behavior of displaying the extension that the plugin supports.

If I'm a napari user with an OME-Tiff file, this could be opened by a reader plugin which has either *.tiff specified as a supported file pattern or *.ome.tiff), but to find all plugins which will open my file, currently I would need to (a) be aware of the file matching syntax and (b) manually scan the list of file patterns to select patterns which match.

We can relieve this burden by doing (b).

Job Stories

Acceptance Criteria

  1. Maintain a list of common file extensions for imaging, starting with https://docs.openmicroscopy.org/bio-formats/5.8.2/supported-formats.html
  2. For each Reader plugin, evaluate the list of common file extensions against the npe2 file pattern for the plugin to identify extensions which are supported by the plugin.
  3. The filter will be populated with file extensions & filtering by a given file extension (say ".ome.tiff") will display plugins that can open such files (say, both *.tiff and *ome.tiff)
DragaDoncila commented 1 year ago

I've looked into this and it is fairly easy to do, I have just two questions I think @neuromusic @richaagarwal :

  1. Do we want to manually maintain this list of supported extensions? For example, the bio-formats list could be programmatically retrieved from here, which is what they use to build documentation. That would get updated automatically. If we wanted to support other formats we could still manually maintain a much shorter list.
  2. Do we still want the individual plugin pages to show the full filename patterns i.e. only the filter options are limited? If so we'll also need changes on the front end to handle the additional metadata.
richaagarwal commented 1 year ago

@DragaDoncila For your first question, I think programmatically retrieving from a reliable source is a better option over manually maintaining our own list, but since we won't be controlling the data source, we should be sure to fail gracefully in the event we can't load the data (maybe the data changes to a format we're not expecting or the file is renamed/removed).

I'll defer to @neuromusic for your second question

neuromusic commented 1 year ago

+1 to @richa's comment. I'll add two thoughts on (1):

for (2), I think that the details page should be consistent with the filters (this will also enable links back to the filter, per the original design)

DragaDoncila commented 1 year ago

we should check to see if there are file formats that napari plugin devs have specified that are NOT supported by bioformats.

@neuromusic I don't have a good sense right now of what isn't covered by the bioformats listing, but I did notice that zarr for example is not listed, which we would definitely want to cover I think. So we likely would maintain our own short list of additional extensions.

I will pull down that doc I found and parse it against a list of all patterns declared by devs to get a full list of what's not included, and that should give us a better idea of coverage. Once we have a better idea of this, and if we're sure we want to proceed with that doc, I can reach out to the OME folks. I did make a zulip thread initially and Talley mentioned that aicsimageio also has a listing of these formats.

for (2), I think that the details page should be consistent with the filters (this will also enable links back to the filter, per the original design)

If we simply update the field contents this shouldn't require any changes to the front end. However, we will need to decide what we want to do with file patterns that don't match any known extensions - do we just list them as is and not allow filtering? Then I think we might need some changes on the front end also.

DragaDoncila commented 1 year ago

Ok so I've done a bit of a breakdown of what kinds of extensions developers are declaring for their readers. In total, 354 distinct file patterns are declared across all plugins. Of those, 83% match a declared bioformat file extension, and there are 60 'other' patterns. Only 18 of these other patterns are declared by more than one plugin.

I've done a breakdown of these patterns in this hackmd doc, grouping them wherever possible. zarrlike patterns as you can see is one of the groups, and there are three plugins (napari-boxmanager, napari-pymeshlab and blik) that declare a large chunk of these other patterns. napari-rioxarray is another one which focuses on GDAL raster data.

I'd say there's a sufficient number there to mean we want to maintain our own listing of formats (in addition to the bioformats list), so we need to come up with some guidelines for what goes into that list.

Some general questions:

  1. What do we do with patterns that are outside the bioformats + our custom formats list:
    • show them only on detail page
    • don't show them at all
    • make them searchable but not filterable
    • put them at the bottom of the filter list (maybe add headings to the filter list e.g. bioformats, geo, tabular, other)
  2. As you can see one of the groups contains file patterns that are in bioformats, but capitalized, so would fail an fnmatch comparison (as used in the viewer to choose readers for a given file). Showing both filters seems redundant, but only showing one e.g. the lowercase version might lead to someone download a plugin that rejects their file. This could be avoided if we used a format filter rather than an extension filter (see below).
  3. Even assuming we only maintain bioformats extensions, that still leaves 294 items in this filter which is quite unwieldy. We could display the format itself (instead of extensions) i.e. what's in the left hand column in the original linked table from Justin, but as you can see many probably vastly different formats share file extensions, so this could also lead to issues. Maybe we should consider dynamically populating this filter with the most popular file extensions (somehow identifying what's "popular") based on analysis of the total listing?

@neuromusic @richaagarwal curious to hear your thoughts based on this information

neuromusic commented 1 year ago

I'd say there's a sufficient number there to mean we want to maintain our own listing of formats (in addition to the bioformats list), so we need to come up with some guidelines for what goes into that list.

Agreed. And ideally, this will be specified in a way that is relatively easy to reference/update/etc (such as a YAML file in the napari hub repo)

For (1), my vote would be "don't show them at all in the metadata" BUT we prioritize https://github.com/chanzuckerberg/napari-hub/issues/543 sooner than later and make the full manifest searchable.

(2) is interesting... don't most operating systems ignore case in determining if a given program supports opening a given extension? this might be an opportunity to improve napari to ensure that its behavior is consistent with other applications

For (3), I'm not worried about an unweildy filter... we can loop in design for guidance here, but the simplest solution would be to swap the filter component for the one we use for authors, which includes a local search functionality

image

Janeece commented 1 year ago

For (3), I'm not worried about an unweildy filter... we can loop in design for guidance here, but the simplest solution would be to swap the filter component for the one we use for authors, which includes a local search functionality

@neuromusic Chiming in on this point, yes— including the search within the dropdown like with authors would be my recommendation as well!

DragaDoncila commented 1 year ago

Thanks for weighing in folks!

my vote would be "don't show them at all in the metadata"

I think it's strange to have metadata available about a plugin and not show it. When it comes to populating the filter I understand limiting the options to common file extensions, but I don't really see the benefit of hiding these details on the plugin page. We'd need a specific message for this (I think "information not submitted" as we currently have would be misleading to the user, since the plugin developer did provide the information), and I think anything in the realm of "provides no known extensions" would immediately beg the question of "ok, what about unknown extensions"? @neuromusic I'm curious why you would recommend hiding the metadata?

don't most operating systems ignore case in determining if a given program supports opening a given extension?

I think so. I checked napari's behavior and we do make sure paths are case insensitive. I've opened an issue on the npe2 repo to make sure patterns are also case insensitive and will follw up with that one.

the simplest solution would be to swap the filter component for the one we use for authors, which includes a local search functionality

Agreed, didn't even know we had that component! Very cool.

neuromusic commented 1 year ago

I'm curious why you would recommend hiding the metadata?

I'm assuming that we prioritize #543 so that the full metadata is available (say, on tab) but only the "known extensions" are displayed in the sidebar.

neuromusic commented 1 year ago

following up on convo w/ @DragaDoncila, I think we could use some design support on (1) to identify the preferred behavior here (cc @Janeece)

neuromusic commented 1 year ago

looking again at your doc @DragaDoncila ...

Would it work if just expand our list of supported file extensions to include any extensions we can identify that follow the *. pattern: \*\.([a-zA-Z\d.]+) or URL protocols: ([a-z]+:\/\/)

given your current hackmd, this would leave only suspected typos as excluded: *_pbm and <EDIT_ME>

DragaDoncila commented 1 year ago

I think that could work.

The only use cases that might be misleading for is when the filename pattern matches a specific naming convention rather than a filetype e.g. */my-special-dir/[0-9][0-9][0-9].tif. That still matches the pattern but it wouldn't open just any .tif (which of course we can't guarantee anyway since there's all sorts of TIFs out there). We don't see any use cases like that at the moment, but the system is built to support them, so we may want to consider them in advance. Or we can continue to review as the number of readers grows and see if it becomes a more common use case.

I do still feel like the plugin details page should show the full filename pattern? We could always provide some placeholder names into the pattern if we were worried it wouldn't read well for people unfamiliar with wildcards etc., but I think it's important to show exactly what the developer declared is supported.

Janeece commented 1 year ago

@neuromusic sorry for the delayed response. Just to confirm, design support for (1) is to identify the preferred behavior for:

  1. What do we do with patterns that are outside the bioformats + our custom formats list

I can take a look at that end of this week/early next week!

Janeece commented 1 year ago

@DragaDoncila @neuromusic circling back on this since there's been a delay on my end — is design support still needed on defining preferred behavior for extensions that are outside of bioformats + the custom list?

Also (forgive the dumb question), can one of you clarify what's the implications of extensions that fall outside of the bioformat+custom list/are "unsupported"? Is it that it wont work with napari and not really relevant metadata for users evaluating a plugin? Thanks!

DragaDoncila commented 1 year ago

is design support still needed on defining preferred behavior for extensions that are outside of bioformats + the custom list?

I think it would be good to have the conversation and have you weigh in @Janeece. Perhaps we can set up a meeting for w.c. March 20th if you have some time?

Also (forgive the dumb question), can one of you clarify what's the implications of extensions that fall outside of the bioformat+custom list/are "unsupported"? Is it that it wont work with napari and not really relevant metadata for users evaluating a plugin? Thanks!

These extensions would absolutely still work with napari. Plugin developers declare these filename patterns for reader plugins as a way to inform the napari viewer (and users) what types of files it can open. We want to expose these patterns in the search filters so that users can easily find plugins that open different filetypes. Currently the filter shows the full filename pattern declared by the developer. This may not be super useful for users who are unaware of the way filename patterns work, and who just want to e.g. find a plugin that opens TIFF files. This discussion therefore is about:

Janeece commented 1 year ago

Hi @DragaDoncila @neuromusic , I finally found some time to look at design recommendations for what do do with unknown extensions. Let me know what you think — I’m happy to take another pass if there is feedback, questions, or factors I didn’t address.

What I explored:

Design Recommendations:

In general I concluded that as a user it is confusing to have available information intentionally hidden. To that end, I recommend to:

Here’s a mock up of the recommendations: https://www.figma.com/proto/oJ3iajVLfGdRRv4eIpxX7F/Plugin-Filter-Behaviors?page-id=116%3A311&node-id=305-29710&viewport=41%2C-33%2C0.13&scaling=min-zoom

richaagarwal commented 1 year ago

@DragaDoncila Is this still something in progress or would it make sense to put back in the backlog?

cc @junxini for awareness

DragaDoncila commented 1 year ago

@richaagarwal sorry this one slipped past me 🙏 I think it makes sense to put it back in the backlog, as I don't foresee having a lot of time to take it on in the next few months.

I just had a look at @Janeece's designs and I think it's a great option! It would be nice still to also categorize the known extensions into bioformats, geo formats, cryo-ET, etc., but maybe we could do that with the same label as the Unknown extension? So basically, we still sort alphabetically but each file pattern has a label? That way we're still providing useful information about what kind of extension it is, but we're not filtering out any options?