Garden-AI / garden

https://garden-ai.readthedocs.io
MIT License
16 stars 4 forks source link

add datasets to entrypoints #452

Closed OwenPriceSkelly closed 3 months ago

OwenPriceSkelly commented 3 months ago

closes #442

Overview

This PR gives entrypoints a proper datasets field (like the existing papers and repositories fields), which users can populate with the metadata of datasets they want to showcase as related to their entrypoints.

This keeps the existing DatasetConnection object mostly as-is, but I did add a validator to enforce that foundry datasets have both a url and doi.

Once a user has instantiated DatasetConnection object(s) in their notebook, there are a couple ways it can be linked to an entrypoint:

All three sources of a DatasetConnection object will end up in the appropriate EntrypointMetadata.datasets attribute (this is handled in the decorator).

This deliberately does not provide a CLI command to link dataset metadata to an existing entrypoint, because I wanted to avoid the following footgun: publish entrypoint to garden -> add dataset to entrypoint -> forget to re-publish garden from the CLI -> publish another entrypoint to same garden from a notebook -> hey, where'd my dataset metadata go?

Discussion

I'm aiming for the minimum ryan-usable product with this PR, imo linking dataset metadata (or any other kind of related work) is still in "bandaid" territory at this point. I have some half-baked ideas around a more general "MetadataConnection" object that we could reuse for papers, datasets, repositories, etc to simplify the way we represent the concept of "related work", but this isn't the PR for that.

Plan to finish baking that idea when there's a way to interact with / edit metadata from a webapp UI instead of only through the sdk.

Testing

I tested this manually by publishing a few entrypoints to the dev search index and verifying that the dataset connection's metadata was included in the entrypoint metadata as expected for datasets linked from either the decorator or constructor.

I didn't write unit tests for this, bc (a) I felt lazy but also (b) this isn't this something I feel like we need set in stone for the future nor is it really core behavior, so not much benefit in tying it down with unit tests.

Documentation

No docs updates along similar reasoning as skipping unit tests, but I could be easily guilted into doing so. otherwise just planning to email ryan with an example so he can keep cookin.


📚 Documentation preview 📚: https://garden-ai--452.org.readthedocs.build/en/452/