StatistikStadtZuerich / stat.stadt-zuerich.ch

API Backend for the linked open statistical data of the Municipality of Zurich
Other
0 stars 0 forks source link

API Specs review #12

Closed lucguillemot closed 6 years ago

lucguillemot commented 6 years ago

cf. https://sszvis-components.netlify.com/#/api-specs

Shape API

The metadata of a dataset should be sufficient to decide on all the possible ways this dataset can be visualized, including controls like dropdown-menus to change on the fly the data subsets, the possibility or not to switch from a map view to a chart view, if a time series is present, if several small line charts are preferable to a single line chart with many lines (unreadable), if the option to use a bar chart should be available or not, if bars should be grouped or stacked, etc.

See the prototype of the front-end app to get a hint on the user experience that flows from the data structure: https://projects.invisionapp.com/d/main#/console/13100338/273973736/preview

Here is a list of missing metadata:

Slices

jstcki commented 6 years ago

@ktk,

Following up on the idea I posted on Slack, and to align it more with the terminology of the RDF Data Cube Vocabulary.

I think the solution might be to have multiple slices per dataset. Slices, as defined in the Data Cube Vocabulary, are groups of observations, used "to fix all but one (or a small subset) of the dimensions and be able to refer to all observations with those dimension values as a single entity". As I understand it, the Data Cube Slice aligns pretty well with what we've named "Facet" (or even a combination of dimensions).

Slice Shapes

Note: this is primarily about the Shape of Slices. Accordingly, all the metadata described below must be available on the Shape endpoints of a dataset. The Shape would contain all Slices of a dataset, so the frontend can decide which chart configuration and controls to render (and which Slice Data to query).

Option A: "Open" Slices

One option could be to keep the Slice definition open. That means a slice would group all observations which share the same dimensions but can have different dimension values. In addition, RAUM would be a kind-of-fixed dimension but not on the individual value but on groups of values (i.e. Stadt, Kreise, Quartiere etc). For example:

As seen here, the "Open" Slices would still need to be described with Facets or possible (= existing!) combinations of dimension values.

Option B: "Fixed" Slices

The other possible option could be Slices where all dimension values are fixed (except time). These slices would be equivalent with our "Facets". For example:

This would be the simpler and less structured approach, but still contain all necessary information for the frontend to construct charts (I think!).

Bonus: In addition to Slices with fixed RAUM and open TIME dimensions, we maybe could also have fixed TIME and open RAUM dimensions alongside them (maybe on different URIs?), useful to create maps … just a thought.

Currently this is my preferred option because it's simpler and more information-rich than what we originally proposed (because value/time domains on a Facet level).

Common Slice properties

Both variants of slices need to specify a few properties per Slice for them to be useful and usable:

Slice Data/Observations

Based on the information about the Shape of the Slices of a Dataset, the frontend should be able to fetch the data of one or more Slices. Ideally, the endpoint which returns Observations can be parameterized with one or more Slices which would be equivalent of an OR query. For example http://stat.stadt-zuerich.ch/api/dataset/BEV001/data?slice=S1&sliceS2 would return observations in both Slices S1 and S2. (This is important, so we don't need to do multiple fetches per graphic)

Final Notes

jstcki commented 6 years ago

@l00mi,

I'm playing around with the new API and try to make sense of the structure. A few questions:

  1. At http://stat.stadt-zuerich.ch/api/dataset/BEW-RAUM-ZEIT-HEL-HEO-SEX?format=json there's only the default slice – is that correct? Also, because there's only 1 object, the slice value is not an array anymore (same problem we had in the tags API).
  2. "Regular" slices don't have a @type – is that correct? I think it would be good if they had one, so we can correctly identify them. E.g. http://stat.stadt-zuerich.ch/api/dataset/ANT-RAUM-ZEIT-EHD-GGH?format=json
  3. The default slice seems to have "@type": ["http://purl.org/linked-data/cube#Slice", "http://ld.stadt-zuerich.ch/schema/DefaultSlice"] – is that correct?
  4. The extra property with the codes and their labels (as discussed yesterday) will come later?
  5. Is the naming of properties final (e.g. shapesGraph 🤔)?
  6. There are a few properties that don't make sense to me – can I just ignore them?:
    • sliceStructure/@id
    • path (on shape properties)
jstcki commented 6 years ago

Oh, and can you point me to a dataset which has more than one RAUM? The ones where I know that this exists don't seem to work … (e.g. http://stat.stadt-zuerich.ch/api/dataset/BEW-RAUM-ZEIT-HEL-SEX)

l00mi commented 6 years ago
  1. At http://stat.stadt-zuerich.ch/api/dataset/BEW-RAUM-ZEIT-HEL-HEO-SEX?format=json there's only the default slice – is that correct? Also, because there's only 1 object, the slice value is not an array anymore (same problem we had in the tags API).

Yes there is only the default slice, and yes the problem is the same with the Array ... and now fixed.

"Regular" slices don't have a @type – is that correct? I think it would be good if they had one, so we can correctly identify them. E.g. http://stat.stadt-zuerich.ch/api/dataset/ANT-RAUM-ZEIT-EHD-GGH?format=json

They should not have a http://ld.stadt-zuerich.ch/schema/DefaultSlice but should have a http://purl.org/linked-data/cube#Slice -> https://github.com/statistikstadtzuerich/ssz-data/issues/50

The default slice seems to have "@type": ["http://purl.org/linked-data/cube#Slice", "http://ld.stadt-zuerich.ch/schema/DefaultSlice"] – is that correct?

Yes it should have DefaultSlice added, if you like we can also abbreviate these to lets say "Slice and DefaultSlice?

The extra property with the codes and their labels (as discussed yesterday) will come later?

Yes, looking into it. Did not find a straight forward solution yesterday. I can add a list of properties with codes, but the framing does not let me tell where to put the labels.

Is the naming of properties final (e.g. shapesGraph 🤔)?

It comes from the standard, if you wish we can change it, propositions?

There are a few properties that don't make sense to me – can I just ignore them?:

Exactly, sorry, wanted to mention this yesterday. Path is for Hydra Specific stuff. And sliceStructure I try to either omit or provide it in full. (It contains basically the "components" aka dimensions of the slice. Which you get implicitly from the Slice already (see here for an example of such a Structure: http://ld.stadt-zuerich.ch/statistics/dataset/EHE-RAUM-ZEIT-ZVF-ZVM/EHEZVF0001ZVM0001/sliceKey)

- `sliceStructure/@id`
- `path` (on shape properties)

Oh, and can you point me to a dataset which has more than one RAUM? The ones where I know that this exists don't seem to work … (e.g. http://stat.stadt-zuerich.ch/api/dataset/BEW-RAUM-ZEIT-HEL-SEX)

Good hint! This might be related with why it not works. I will have a look.

jstcki commented 6 years ago

Great, thanks!

The default slice seems to have "@type": ["http://purl.org/linked-data/cube#Slice", "http://ld.stadt-zuerich.ch/schema/DefaultSlice"] – is that correct?

Yes it should have DefaultSlice added, if you like we can also abbreviate these to lets say "Slice and DefaultSlice?

I don't really mind the IRIs, as long as the @type is stable (then I can use it in my code to actually differentiate types 😀). But, maybe it's a good idea to be consistent with the others (like Dimension and Topic on tags, NodeShape, DataSet etc.). Does this mean that I can expect "@type": "DefaultSlice" (preferred) or will it still be "@type": ["Slice", "DefaultSlice"] (and "@type": "Slice" for non-default slices)?

Is the naming of properties final (e.g. shapesGraph 🤔)?

It comes from the standard, if you wish we can change it, propositions?

Naming after the standard is fine by me, I was just curious why it's not shape (analogous to slice). But nevermind!

And sliceStructure I try to either omit or provide it in full. (It contains basically the "components" aka dimensions of the slice. Which you get implicitly from the Slice already (see here for an example of such a Structure: http://ld.stadt-zuerich.ch/statistics/dataset/EHE-RAUM-ZEIT-ZVF-ZVM/EHEZVF0001ZVM0001/sliceKey)

Ah, that's useful but I can also get that from the NodeShape, no?

l00mi commented 6 years ago

Great, thanks!

The default slice seems to have "@type": ["http://purl.org/linked-data/cube#Slice", "http://ld.stadt-zuerich.ch/schema/DefaultSlice"] – is that correct?

Yes it should have DefaultSlice added, if you like we can also abbreviate these to lets say "Slice and DefaultSlice?

I don't really mind the IRIs, as long as the @type is stable (then I can use it in my code to actually differentiate types 😀). But, maybe it's a good idea to be consistent with the others (like Dimension and Topic on tags, NodeShape, DataSet etc.). Does this mean that I can expect "@type": "DefaultSlice" (preferred) or will it still be "@type": ["Slice", "DefaultSlice"] (and "@type": "Slice" for non-default slices)?

It will be still the different Types, sorry because the DefaultSlice is still a Slice ... But I will Frame it to the names as proposed.

Is the naming of properties final (e.g. shapesGraph 🤔)?

It comes from the standard, if you wish we can change it, propositions?

Naming after the standard is fine by me, I was just curious why it's not shape (analogous to slice). But nevermind!

Basically because its with this attached to the "slice" e.g. http://ld.stadt-zuerich.ch/statistics/dataset/EHE-RAUM-ZEIT-ZVF-ZVM/slice

And sliceStructure I try to either omit or provide it in full. (It contains basically the "components" aka dimensions of the slice. Which you get implicitly from the Slice already (see here for an example of such a Structure: http://ld.stadt-zuerich.ch/statistics/dataset/EHE-RAUM-ZEIT-ZVF-ZVM/EHEZVF0001ZVM0001/sliceKey)

Ah, that's useful but I can also get that from the NodeShape, no?

Yes in this case true. So probably better to omit. Makes things cleaner.

l00mi commented 6 years ago

Okay and last push also fixes the problem with the API's which did not work. Basically it did not provide APIs for everything without any dimension locked. That is why "RAUM" never appeared.

More on the codes tomorrow.

jstcki commented 6 years ago

@l00mi, we've been working with the API the last few days and mostly everything works! I hope I can show some charts tomorrow 😄

Some notes:

l00mi commented 6 years ago

@ktk can you habe look please for the 1th (multiple labels) and 2nd (metadata) issues?

ktk commented 6 years ago

@herrstucki tnx Jeremy will have a look at it

jstcki commented 6 years ago

@ktk another thing:

In the BEW-* datasets there are obervations with measure value 0. We now filter them out – because they most certainly represent missing data – but I'm pretty sure that's not what we should be doing as a general rule 😀 E.g. http://stat.stadt-zuerich.ch/detail?dataset=BEW-RAUM-ZEIT-SEX

l00mi commented 6 years ago

@herrstucki I saw that you "fixed" that in your code. Please just show the 0 for now. Like this we can also communicate this easier to SSZ back.

jstcki commented 6 years ago

One more question:

ktk commented 6 years ago

@herrstucki that might explain indeed, let's discuss that seperately to see how we handle that

sszscm commented 6 years ago

Problem ebenfalls vorhanden in bspw. SCK-RAUM-ZEIT-BTA-SKG-SST und ähnlichen. Immer dann, wenn ein Merkmal nur eine Ausprägung hat und sozusagen das Dataset definiert. Im von herrstucki definiertem Fall wäre der Korrekte Titel des Datensatzes Schweizer Wohnbevölkerung, was somit die eine Ausprägung von HEL schon impliziert, so dass sie fürs Frontend eigentlicht nicht berücksichtigt werden muss. Selbiges auch für das erwähnte Beispiel: Betriebsklasse ist immer Volksschule, und somit eigentlich nicht in der Tabelle / Grafik benötigt. Ich schlage vor wir diskutieren dieses Problem separat und versuchen es im Rahmen der Kuration zu lösen. Einige der Datensätze sind jetzt nicht zwangsläufig sinnvoll aufgebaut (im Hinblick auf Redundanzen / unnötige resp. "logische" Dimensionen). Ev. lässt sich das aber auch mit einer simplen Regeln (wenn erste Dimension nur eine Ausprägung, dann...) lösen. Dafür müssten wir aber erstmal alle Daten dahingehend prüfen.

@MauroBaster HEO heisst aktuell Heimatort, falls Daten aber von T_1.1.40 kommen, dann wäre das der Heimatkanton.

MauroBaster commented 6 years ago

Heimatort ist das Label der Gruppe. Darin befinden sich Gemeinde, Bezirke und Kantone.

Von: Marc Schneider [mailto:notifications@github.com] Gesendet: Mittwoch, 2. Mai 2018 17:13 An: statistikstadtzuerich/stat.stadt-zuerich.ch stat.stadt-zuerich.ch@noreply.github.com Cc: Baster Mauro (SSZ) Mauro.Baster@zuerich.ch; Mention mention@noreply.github.com Betreff: Re: [statistikstadtzuerich/stat.stadt-zuerich.ch] API Specs review (#12)

Problem ebenfalls vorhanden in bspw. SCK-RAUM-ZEIT-BTA-SKG-SST und ähnlichen. Immer dann, wenn ein Merkmal nur eine Ausprägung hat und sozusagen das Dataset definiert. Im von herrstucki definiertem Fall wäre der Korrekte Titel des Datensatzes Schweizer Wohnbevölkerung, was somit die eine Ausprägung von HEL schon impliziert, so dass sie fürs Frontend eigentlicht nicht berücksichtigt werden muss. Selbiges auch für das erwähnte Beispiel: Betriebsklasse ist immer Volksschule, und somit eigentlich nicht in der Tabelle / Grafik benötigt. Ich schlage vor wir diskutieren dieses Problem separat und versuchen es im Rahmen der Kuration zu lösen. Einige der Datensätze sind jetzt nicht zwangsläufig sinnvoll aufgebaut (im Hinblick auf Redundanzen / unnötige resp. "logische" Dimensionen). Ev. lässt sich das aber auch mit einer simplen Regeln (wenn erste Dimension nur eine Ausprägung, dann...) lösen. Dafür müssten wir aber erstmal alle Daten dahingehend prüfen.

@MauroBasterhttps://github.com/MauroBaster HEO heisst aktuell Heimatort, falls Daten aber von T_1.1.40 kommen, dann wäre das der Heimatkanton.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/statistikstadtzuerich/stat.stadt-zuerich.ch/issues/12#issuecomment-386012882, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AfjJ_CN0UfI8Y8zEdh8oJYe8W3w53Mgbks5tuczfgaJpZM4R5ho-.

ktk commented 6 years ago

Wir schliessen das Issue hier mal, Slices werden eventuell später noch angepasst.