fathomnet / models

A collection of machine learning models trained on FathomNet
https://hf.co/FathomNet
26 stars 2 forks source link

Pre-PR Discussion #3

Closed Jordan-Pierce closed 1 month ago

Jordan-Pierce commented 1 year ago

Hi FathomNet team,

During the last meeting there was talk of adding to the FathomNet Model Zoo. A bit ago I had started adding to my fork, including:

My questions for contributing via PR are:

I'd love to contribute, but I wanted to iron out some details beforehand so I can reorganize to fit your desired structure. Cheers.

eor314 commented 1 year ago

Hi @Jordan-Pierce, your timing is impeccable! Thanks for opening this, lots to discuss.

eor314 commented 1 year ago

We are toying with the idea of just moving the whole model zoo over to HuggingFace since it integrates a lot of the functionality you listed out. I just started a FathomNet organization there last week. We also uploaded a new version of our midwater supercategory detector with a model card formatted as bullet points organized under the headings recommended in Mitchell et al., 2019. Would you mind checking it out and letting us know what you think?

One thing I like about the HuggingFace model cards is the slick inclusion of validation metrics. I haven't set up a HuggingSpace demo for that new model. I'd love to check out what you put together there.

eor314 commented 1 year ago

I started poking around in your fork a bit. I like the repackaging that you've done, it is certainly more informative than the table on its own. Do you think it is worth setting something like that up independent of a HuggingFace organization?

Jordan-Pierce commented 1 year ago

@eor314 oh nice, it looks good! I wasn't aware of being to upload a tensorboard event, that's snazzy.

I've made the three HuggingSpace demos public here. If you go to the files for any of the spaces you'll see that I tried to keep it consistent across each:

I do not have model cards, just spaces, but I really like the idea of joining the two like in this Space, which is setup to be able to use the API and a demo on the same page as the model card (see widgets), as opposed to having a separate Model Card and HuggingSpace. They also released this tool to help create a model card so that there is consistency (may or may not be useful). Is your model card currently setup so that someone can use the Transformer's library and import the model for use?

I would think that more people are familiar with GitHub rather than HuggingFace, so it might(?) be easier for some to get what they need from the GitHub repo as opposed to HuggingFace (though I'm sure the demo would be appreciated, as they can test it out on their data before getting too deep into the code). Perhaps within the FathomNet Model Zoo (GitHub) having a sub-folder structure for each model that contains essentially the same information that model card contains (code, requirements, readme, etc), and have it link back to the HuggingFace model card?

If that's too much, then maybe keeping your GitHub repo as-is, and just adding an additional column that goes to the Model Card, and in the files section of the Model Card, it contains everything needed to do inference, fine-tuning, etc. Then they can just download the files as .zip and have at it. I think I prefer this option tbh.

eor314 commented 1 year ago

The Spaces you set up for the existing models looks great to me. And that example of the model card + a gradio demo is nicely put together. Putting that together sounds pretty straightforward but I'll need to take a few minutes to properly go through the documentation.

I think if we go the HuggingFace route we will keep the Git repo as-is with links out to the Model Card. I think that avoids lots of duplicate effort between the HuggingFace repo and git. That said, HuggingFace models are just git repos. Maybe there is a clever way to fork from HF to the git model zoo?

eor314 commented 1 year ago

To your original question about PR-ing your model zoo fork: let's hold off on a decision until later this week. We'll discuss in our standing meeting on Thursday and come up with some guidance.

eor314 commented 1 year ago

@Jordan-Pierce for the time being let's merge your PR to a new branch. That way our team can play around with it a bit more and better inform how to proceed.

We are hoping to reach a yes/no decision on migrating to HuggingFace by the end of next month.

davanstrien commented 1 year ago

Hey! Sorry to jump into this issue out of the blue, but I just stumbled upon this project and think it's super cool!

I'm very keen to help support domain-specific users of the Hugging Face hub, so please let me know if you need any support with anything on there :)

Jordan-Pierce commented 1 year ago

@Jordan-Pierce for the time being let's merge your PR to a new branch. That way our team can play around with it a bit more and better inform how to proceed.

We are hoping to reach a yes/no decision on migrating to HuggingFace by the end of next month.

Will do.

Hey @davanstrien, not necessarily the organization of FathomNet's GitHub model zoo, but would you mind speaking to the features of HuggingFace/Spaces for organizing models, demos, API, and just making things accessible to the public (based on what you're seeing in the main repo and my fork)?

davanstrien commented 1 year ago

would you mind speaking to the features of HuggingFace/Spaces for organizing models, demos, API, and just making things accessible to the public (based on what you're seeing in the main repo and my fork)?

Sure! Firstly, there are various different ways you can approach organizing models/datasets as part of a community. One option for the organization you've created is to use that as a central hub which people can use to share models and datasets related to FathomNet. You could allow people to join this org with a default contributor role. This would allow them to create new models, datasets, and spaces without editing or deleting existing ones. This can be an excellent way of quickly allowing a community to share without needing to verify people manually. See https://huggingface.co/docs/hub/organizations-security for more details on this.

For the models and datasets, there are various bits of metadata you can assign to models. Some of these relate to the model task i.e. image-classification, and license. You can also add custom tags. These could be useful for curating your material in a more bespoke way i.e. maybe you add a tag for a particular project or sub-domain.

Screenshot 2023-05-31 at 10 41 40

You can also add related datasets used to train a model. Adding this metadata to a dataset/model page will expose the links between models and datasets. This can be a nice way of finding models trained on a particular task/dataset.

Screenshot 2023-05-31 at 10 50 03

You can rely solely on the existing search/filtering features of the hub to present your models/datasets/space but you could also generate tables similar to the one you've already created.

One way you could do this is to rely on the URLs from the hub, i.e. https://huggingface.co/models?pipeline_tag=object-detection&sort=downloads&search=microsoft will find models from Microsoft for object detection. Another way would be to use the hub API (either directly or via the Python client library). There is an example of doing something like this for generating an overview of models by task for an org here: https://danielvanstrien.xyz/metadata/huggingface/2023/03/07/readme-template.html. For example, you could automatically update the table here: https://github.com/Jordan-Pierce/FathomNet#object-detection- using the API in combination with the tags for models on the hub. This means you don't have to manually update the list of models as long as people are using the correct metadata for their models.

Another option would be to use a Space to create a web app that could give you more control over how people can search and interact with your collection. I plan to create an example of doing this at some point in the next few weeks, so I'll try and remember to share that here when I've done that.

Let me know if anything here isn't clear or if I can answer any other questions.

eor314 commented 1 year ago

Hi @davanstrien thanks for joining the conversation! We really appreciate the inside scoop. We are in the process of doing our homework and digging through the docs. Do you mind if we reach out to you directly with a few clarifying questions?

davanstrien commented 1 year ago

Hi @davanstrien thanks for joining the conversation! We really appreciate the inside scoop. We are in the process of doing our homework and digging through the docs. Do you mind if we reach out to you directly with a few clarifying questions?

Sure! Feel free to either ping me here or send me a message on Twitter (https://twitter.com/vanstriendaniel)

davanstrien commented 1 year ago

Sorry to ping you here. This object detection leaderboard might be interesting for this project. In particular, they are looking for suggestions for tasks/datasets outside of the usual benchmark datasets to include in the leaderboard: https://huggingface.co/spaces/rafaelpadilla/object_detection_leaderboard/discussions/1. It could be interesting to add one of your object detection datasets there. It could be a cool way to make it easier to evaluate the best models for more scientific object detection applications.