AllenNeuralDynamics / aind-data-schema-models

Data models used in aind-data-schema
MIT License
0 stars 2 forks source link

Separate large vocabularies (use custom validation) from small vocabularies and subsets (use codegen) #80

Open dbirman opened 6 days ago

dbirman commented 6 days ago

Copying summary info from discussion w/ David + Saskia

  1. check in large vocabs to aind-data-schema-models in CSV (or maybe parquet, they are big)
  2. make subsets by creating separate subset files (e.g. lens_manufacturers.csv)
  3. don't create all the pydantic classes, instead make sets that should hopefully load faster
  4. make a library of custom validators in aind-data-schema-models
  5. make use of those custom validators in aind-data-schema and decorate fields that use them with json_schema_extra
  6. in metadata-entry, pull vocabs from docdb intelligently by looking at json_schema_extra
dbirman commented 3 days ago

Additional discussion w/ Jon and Bruno: