ImagingDataCommons / idc-index

Python package to simplify access to the data available from NCI Imaging Data Commons
https://idc-index.readthedocs.io/
MIT License
11 stars 5 forks source link

Documentation of the underlying index tables #126

Open fedorov opened 2 weeks ago

fedorov commented 2 weeks ago

We need to set up a process where we could have schemas and relationships among the growing number of those smaller tables automatically reflected in our documentation, and ideally have a visual browser where users could explore those relationships - automatically generated from the schema documents.

Related thread with ideas and relevant technologies: https://discord.com/channels/909674491309850675/921073327009853451/1283795006477565983

fedorov commented 2 weeks ago

Some NCI components use this for describing the model: https://github.com/CBIIT/c3dc-model

vkt1414 commented 1 day ago

@fedorov in the past, I was discussing this with Deepa about having a relationship diagram.

https://github.com/drawdb-io/drawdb was the tool I found. Its pretty good in my opinion. If you find it good as well, I can help with this issue.

fedorov commented 10 hours ago

The other tool mentioned in the thread above - Mermaid - seemed like a nice solution:

drawdb looks sleek, but I think the question is what is next once you modeled it there? I don't want to create yet another manual task for anyone.

On the other hand, we can automatically generate Mermaid code directly from the Parquet files (column name + data type). We could then embed that Mermaid code into the docs. We could also augment idc-index-data with a mechanism to either inject descriptions of the columns directly into Parquet files metadata fields, or require a JSON schema to accompany each query. Or if we want to play nice with CRDC use Bento MDF. We could next generate Mermaid diagram code as part of the release, which could then be picked up downstream in the IDC documentation and/or idc-index documentation.