This PR gives entrypoints a proper datasets field (like the existing papers and repositories fields), which users can populate with the metadata of datasets they want to showcase as related to their entrypoints.
This keeps the existing DatasetConnection object mostly as-is, but I did add a validator to enforce that foundry datasets have both a url and doi.
Once a user has instantiated DatasetConnection object(s) in their notebook, there are a couple ways it can be linked to an entrypoint:
Directly via the datasets=[...] kwarg when instantiating their EntrypointMetadata object
via a datasets=[...] kwarg to the @garden_entrypoint decorator
(old way) via one of their a model connector's connector.metadata.datasets attribute, if they can find it. This is just in here for compatibility.
All three sources of a DatasetConnection object will end up in the appropriate EntrypointMetadata.datasets attribute (this is handled in the decorator).
This deliberately does not provide a CLI command to link dataset metadata to an existing entrypoint, because I wanted to avoid the following footgun: publish entrypoint to garden -> add dataset to entrypoint -> forget to re-publish garden from the CLI -> publish another entrypoint to same garden from a notebook -> hey, where'd my dataset metadata go?
Discussion
I'm aiming for the minimum ryan-usable product with this PR, imo linking dataset metadata (or any other kind of related work) is still in "bandaid" territory at this point. I have some half-baked ideas around a more general "MetadataConnection" object that we could reuse for papers, datasets, repositories, etc to simplify the way we represent the concept of "related work", but this isn't the PR for that.
Plan to finish baking that idea when there's a way to interact with / edit metadata from a webapp UI instead of only through the sdk.
Testing
I tested this manually by publishing a few entrypoints to the dev search index and verifying that the dataset connection's metadata was included in the entrypoint metadata as expected for datasets linked from either the decorator or constructor.
I didn't write unit tests for this, bc (a) I felt lazy but also (b) this isn't this something I feel like we need set in stone for the future nor is it really core behavior, so not much benefit in tying it down with unit tests.
Documentation
No docs updates along similar reasoning as skipping unit tests, but I could be easily guilted into doing so. otherwise just planning to email ryan with an example so he can keep cookin.
closes #442
Overview
This PR gives entrypoints a proper
datasets
field (like the existingpapers
andrepositories
fields), which users can populate with the metadata of datasets they want to showcase as related to their entrypoints.This keeps the existing
DatasetConnection
object mostly as-is, but I did add a validator to enforce that foundry datasets have both a url and doi.Once a user has instantiated
DatasetConnection
object(s) in their notebook, there are a couple ways it can be linked to an entrypoint:datasets=[...]
kwarg when instantiating theirEntrypointMetadata
objectdatasets=[...]
kwarg to the@garden_entrypoint
decoratorconnector.metadata.datasets
attribute, if they can find it. This is just in here for compatibility.All three sources of a
DatasetConnection
object will end up in the appropriateEntrypointMetadata.datasets
attribute (this is handled in the decorator).This deliberately does not provide a CLI command to link dataset metadata to an existing entrypoint, because I wanted to avoid the following footgun: publish entrypoint to garden -> add dataset to entrypoint -> forget to re-publish garden from the CLI -> publish another entrypoint to same garden from a notebook -> hey, where'd my dataset metadata go?
Discussion
I'm aiming for the minimum ryan-usable product with this PR, imo linking dataset metadata (or any other kind of related work) is still in "bandaid" territory at this point. I have some half-baked ideas around a more general "
MetadataConnection
" object that we could reuse for papers, datasets, repositories, etc to simplify the way we represent the concept of "related work", but this isn't the PR for that.Plan to finish baking that idea when there's a way to interact with / edit metadata from a webapp UI instead of only through the sdk.
Testing
I tested this manually by publishing a few entrypoints to the dev search index and verifying that the dataset connection's metadata was included in the entrypoint metadata as expected for datasets linked from either the decorator or constructor.
I didn't write unit tests for this, bc (a) I felt lazy but also (b) this isn't this something I feel like we need set in stone for the future nor is it really core behavior, so not much benefit in tying it down with unit tests.
Documentation
No docs updates along similar reasoning as skipping unit tests, but I could be easily guilted into doing so. otherwise just planning to email ryan with an example so he can keep cookin.
📚 Documentation preview 📚: https://garden-ai--452.org.readthedocs.build/en/452/