TRI-ML / dgp

ML Dataset Governance Policy for Autonomous Vehicle Datasets
https://tri-ml.github.io/dgp/
MIT License
93 stars 63 forks source link

schema: add ParallelDomainFrameMetadata for Parallel Domain datasets. #87

Closed nehalmamgain closed 2 years ago

nehalmamgain commented 2 years ago

This change is Reviewable

nehalmamgain commented 2 years ago

Hey Tyler!

We chose to have GenericMetadata because metadata is something that can change over time due to our interest (tags our data scientists are interested in for AL/auto-labeling) and the heterogeneity of sensors. We'd much rather that metadata exist as key, value pairs than have field numbers which could possibly be a pain to maintain backward-compatibility with over time.

nehalmamgain commented 2 years ago

Hi Kuan! Thanks for your comments!

I imagine it's some other Parallel Domain dataset (I don't know which - maybe James or Quincy could tell) that uses the messages currently on this repository under metadata.proto. Those metadata are not at all used in another dataset of interest we have (for lane lines). Let me share specifically what the metadata looks like with you and Tyler over Slack.

@tk-woven

Do we anticipate changing the schema so often that this feature does not suit our needs?

Unfortunately, yes. There are a lot of tags in our initial request (which PD cannot render right now and we have an even more detailed one (~200) for the real dataset). As PD's assets mature, the metadata can definitely grow into more fine-grained options.

nehalmamgain commented 2 years ago

Sorry, continuing the discussion on Slack to discuss specifics, since this is public repo 🙂

nehalmamgain commented 2 years ago

@kuanleetri Yes, as per the schema we agreed upon on Slack, I will take that to PD. Thank you! Waiting on Quincy's comments for any further changes.