add endpoint serving table schemas

stolarczyk commented 3 years ago

Can you give some context? What label field?

The result from /_private_api/filters/{table_name} shows a label field that I can use to replace things like "regions_no" and "mean_region_width".

[
  {
    "id": "name",
    "label": "BED file name",
    "type": "string",
    "validation": null,
    "operators": [
      "equal",
      "not_equal",
      "in",
      "not_in",
      "is_null",
      "is_not_null"
    ]
  },
  {
    "id": "md5sum",
    "label": "BED file checksum",
    "type": "string",
    "validation": null,
    "operators": [
      "equal",
      "not_equal",
      "in",
      "not_in",
      "is_null",
      "is_not_null"
    ]
  },
  {
    "id": "regions_no",
    "label": "Number of regions",
    "type": "integer",
    "validation": {
      "min": 0,
      "step": 1
    },
    "operators": [
      "equal",
      "not_equal",
      "greater",
      "greater_or_equal",
      "between",
      "less",
      "less_or_equal",
      "is_null",
      "is_not_null"
    ]
  },
  {
    "id": "gc_content",
    "label": "GC content",
    "type": "double",
    "validation": {
      "min": 0,
      "step": 0.01
    },
    "operators": [
      "equal",
      "not_equal",
      "greater",
      "greater_or_equal",
      "between",
      "less",
      "less_or_equal",
      "is_null",
      "is_not_null"
    ]
  },
...

Originally posted by @xuebingjie1990 in https://github.com/databio/bedhost-ui/issues/6#issuecomment-781529250

xuebingjie1990 commented 3 years ago

I'm trying out the endpoints you added, and got Internal Server Error. I think your are still working on it, so I don't know what would be the format of the result yet. But since it's a separate endpoint, if I want to use the label as the header in the table displaying the data I got from /api/bed/{md5sum}/file/{id}, I assume I need to map the column names with the table schema to get the label. is that right?

stolarczyk commented 3 years ago

yes, I was still working on this...

you can test it now:

bedfile: http://dev1.bedbase.org/api/bed/all/schema
bedset: http://dev1.bedbase.org/api/bedset/all/schema

xuebingjie1990 commented 3 years ago

bedset: http://dev1.bedbase.org/api/bedset/all/schema

because the way bedset table is, I can only get the schema related to the statistics like this:

bedset_means:
  type: object
  description: Mean statistics of the BED files in this BED set
bedset_standard_deviation:
  type: object
  description: Standard deviations of statistics of the BED files in this BED set

i'm not sure if there is a way to get the stats description like they are in the bedfile table schema:

"regions_no": {
    "type": "integer",
    "description": "Number of regions"
  },
  "gc_content": {
    "type": "number",
    "description": "GC content"
  },
  "mean_absolute_tss_dist": {
    "type": "number",
    "description": "Mean absolute distance from transcription start sites"
  },
  "mean_region_width": {
    "type": "number",
    "description": "Mean region witdth"
  },
  "exon_frequency": {
    "type": "number",
    "description": "Exon frequency"
  },
......

can I add the stats field to the bedset schema?

stolarczyk commented 3 years ago

I'm not sure I understand. Do you mean adding the specific attribute names to the bedset_means and bedset_standard_deviation objects?

Why don't you just get the descriptions of these attributes from the bed endpoint?

The keys in the bedset_means column in the bedsets table (see below) match the keys in the schema served here: http://dev1.bedbase.org/api/bed/all/schema. It seems like all the data you need is there:

{"gc_content": 0.46169090909090915, "exon_frequency": 7348.272727272727, "exon_percentage": 0.03870909090909091, "intron_frequency": 73705.72727272728, "fiveutr_frequency": 5744.0, "intron_percentage": 0.4214272727272727, "mean_region_width": 423.29481818181824, "fiveutr_percentage": 0.041545454545454545, "threeutr_frequency": 5601.0, "threeutr_percentage": 0.028654545454545453, "intergenic_frequency": 70326.27272727272, "intergenic_percentage": 0.38509090909090915, "mean_absolute_tss_dist": 52599471.51939091, "promotercore_frequency": 5399.727272727273, "promoterprox_frequency": 6220.545454545455, "promotercore_percentage": 0.04519090909090909, "promoterprox_percentage": 0.039363636363636365}

stolarczyk commented 3 years ago

if this works for you, that would be better than hardcoding them in the bedsets schema since it will be flexible. The stats included in the bedset_means/deviations are selected automatically in bedbuncher based on the bedfiles results type.

xuebingjie1990 commented 3 years ago

I'm not sure I understand. Do you mean adding the specific attribute names to the bedset_means and bedset_standard_deviation objects?

Why don't you just get the descriptions of these attributes from the bed endpoint?

The keys in the bedset_means column in the bedsets table (see below) match the keys in the schema served here: http://dev1.bedbase.org/api/bed/all/schema. It seems like all the data you need is there:
{"gc_content": 0.46169090909090915, "exon_frequency": 7348.272727272727, "exon_percentage": 0.03870909090909091, "intron_frequency": 73705.72727272728, "fiveutr_frequency": 5744.0, "intron_percentage": 0.4214272727272727, "mean_region_width": 423.29481818181824, "fiveutr_percentage": 0.041545454545454545, "threeutr_frequency": 5601.0, "threeutr_percentage": 0.028654545454545453, "intergenic_frequency": 70326.27272727272, "intergenic_percentage": 0.38509090909090915, "mean_absolute_tss_dist": 52599471.51939091, "promotercore_frequency": 5399.727272727273, "promoterprox_frequency": 6220.545454545455, "promotercore_percentage": 0.04519090909090909, "promoterprox_percentage": 0.039363636363636365}

that could work. I'm just not sure if the descriptions in the http://dev1.bedbase.org/api/bed/all/schema would be misleading to use in the context of bedset. (since they would be the average of the bedfiles in the bedset)

databio / bedhost

add endpoint serving table schemas #40