brain-bican / models

BICAN data models
https://brain-bican.github.io/models/
3 stars 3 forks source link

adding ccn2 schema and restructure the repo #1

Closed djarecka closed 1 year ago

djarecka commented 1 year ago

the original schema from: https://github.com/brain-bican/CCN2/blob/main/src/schema/schema.json

A draft of linkml file with some issues/questions (search "todo").

ccn.json created with: gen-json-schema ccn.yaml > ccn.json

EDIT

satra commented 1 year ago

@djarecka - can this be updated to using slots (see: https://github.com/atlaskb/models/pull/1)

also can you add a github workflow that runs linkml-lint and linkml-validate? and perhaps put a structure to this repo? see if the linkml structure is useful (https://github.com/biolink/biolink-model)?

cc: @sooyounga

djarecka commented 1 year ago

@satra - after moving to slots and removing provenance, cell_type_name, etc. from the class the structure of the json has changed more, I believe the idea is the same, but you can check.

I understood from the biolink guide that if i add defining_slots I don't need to have the section with slot_usage: cell set accession: required: true, etc., but when I checked the json the fields were not in the required list and from the linkml description it is not clear to me that it can be used instead of required: true. Any thought? Perhaps I should just remove the defining_slots sections if I have to use slot_usage anyway?

I will work on tests now.

djarecka commented 1 year ago

I believe linkml-validate is used to validate data, not the schema alone. Do you want me to add a sample of the data?

And as I mentioned in the slack, seems like to official guidance from linkml is that we use a different naming convention: https://linkml.io/linkml/schemas/linter.html#standard-naming I could disable it, but just to keep in mind that bioschema does not follow the official convention

satra commented 1 year ago

defining_slots are intended to be elements, that if provided would identify the class but don't restrict the data models to require values for those slots. hence, required may still be required.

regarding validate, this would be more on the test side perhaps.

djarecka commented 1 year ago

In the guide you sent it was Unless it is designated as one of the defining_slots (see below) or slot_usage (see below) specifies that a given slot is required: true (see below), then it is not mandatory ... That's why I thought that require=True might be avoided.

I have to say that it's not clear to me why I would like to use defining_slots, that's why I asked if you have any thoughts whether it should stay. If not, I will remove it

djarecka commented 1 year ago

regarding validate, this would be more on the test side perhaps.

looks like biolink is using the python API, so will write the tests using it

djarecka commented 1 year ago

@satra - I've added some simple tests, we should have more, but perhaps we can start from here. I'm planning to create a GA action for this, but I want to ask what else would you want as a GA? During the call you mentioned about publishing the json file automatically, but I'm not sure where do you want to publish. I took from biolink structure and created dirs for json and python versions, but I don't think they are creating these automatically...

satra commented 1 year ago

if indeed they don't have full automations, what do you think, should we generate the schema and models in a separate repo or this one? GA would allow us to write to a different repo on merge into master.

so this would only contain yaml files, and perhaps GA to generate docs.

djarecka commented 1 year ago

perhaps I was wrong and they do generate it in GA, now when I'm checking their workflows it looks like they are adding some artifacts in the repo: https://github.com/biolink/biolink-model/blob/master/.github/workflows/regenerate-artifacts.yml#L45

I thought that there had different day of modification last time I checked, but now it looks like they were all changed 6h ago... perhaps the previous changes were just not affecting the python version? or I simply checked something else...

I can try to add something similar tomorrow, will just focus on json and python for now

djarecka commented 1 year ago

@satra - do you have any linting rules you want to add? https://linkml.io/linkml/schemas/linter.html

for now I just disabled the naming warnings, since we don't want to use it. https://github.com/atlaskb/models/blob/33d76a10e41855af5667b5a653ee4c8a6243091f/lint_config.yaml I can add a GA for this, but I understand that I should use --ignore-warnings, so the warnings will still return code 0.

I was also thinking a bit more about the organization, and now I think that perhaps pydantic schemas, the ones that will be mostly used in other libraries, should be pushed to a separate repo. I understand that the repo with the pydantin schemas would be published in PyPI?