databio / bbconf

Configuration package for bedbase project
https://pypi.org/project/bbconf/
BSD 2-Clause "Simplified" License
1 stars 2 forks source link

do schemas need to be updated #20

Closed nsheff closed 6 months ago

nsheff commented 10 months ago

There are several schemas here:

https://github.com/databio/bbconf/tree/dev/bbconf/schemas

I believe these are pipestat schemas. But when pipestat was updated, the schema definition changed.

bbconf has I believe been updated to use the new pipestat.

Are these schemas up-to-date?

khoroshevskyi commented 10 months ago

This pipestat sachems were updated by @donaldcampbelljr , and I believe they are working and up-to-date

donaldcampbelljr commented 10 months ago

I believe I only updated the pipeline names but did not touch the actual schema fields as they were working during testing with the new pipestat. Pipestat should raise a schema error if the required keys are not present for complex types.

nsheff commented 10 months ago

I thought the pipestat schema format changed like this:

https://github.com/pepkit/pipestat/issues/20

nsheff commented 10 months ago

I see now. It was not actually updated to be json-schema compatible.

I think we need to change the pipestat schema format to follow json-schema...

The reason I think this is that I think we may be able to use jsonschema functionality for pydantic model creation that will simplify things with bedhost...

See:

nsheff commented 8 months ago

@donaldcampbelljr pipestat schemas are now json schemas, right?

donaldcampbelljr commented 8 months ago

Pipestat can accept either the old way or a JSON schema, yes.

However, after our discussion ~ last week, I realized we are still using types such as file and image within the JSON pipestat schema (and later converting them to be objects).

Therefore the schema is still not actually 100% JSON schema: https://json-schema.org/understanding-json-schema/reference/type

nsheff commented 8 months ago

I thought we decided on doing:

type: object
object_type: file

that makes it a valid json-schema, but allows our type-specific treatment.

donaldcampbelljr commented 8 months ago

Close, but right now its:

type: file
object_type: file

Pipestat still uses the type field and if its an image or a file, recursively replaces that with title and path and changes the type to object.

donaldcampbelljr commented 8 months ago

Current bbconf example from dev:

pipeline_name: bedfile
samples:
  name:
    type: string
    description: BED file name
  genome:
    type: object
    description: genome assembly of the BED files
  bedfile:
    type: file
    pipestat_type: file
    label: bed
    description: BED file
  bigbedfile:
    type: file
    pipestat_type: file
    label: bigBed
    description: bigBed file
nsheff commented 8 months ago

that's the old format, not a json schema at all

donaldcampbelljr commented 8 months ago

Yes, pipestat can accept either format currently. However, I'm just showing how the pipestat_type was implemented along with the original type.

nsheff commented 8 months ago

We need to update those bbconf schemas to use the new format, and deprecate the old.

Issue raised: https://github.com/databio/bbconf/issues/32

nsheff commented 8 months ago

Relevant issues:

Also, eido schemas: https://eido.databio.org/en/latest/writing-a-schema/

Code doing the replacing:

https://github.com/pepkit/pipestat/blob/558851075817e7eb73d1f7bebf1280ded98a3362/pipestat/parsed_schema.py#L373-L399

Replacements defined here:

https://github.com/pepkit/pipestat/blob/2790bed3ac3e7487f087a857468ce2084011dc16/pipestat/const.py#L78-L96

khoroshevskyi commented 6 months ago

Fixed in 0.4.0 release