DigitalCommons / mykomap-monolith

A web application for mapping initiatives in the Solidarity Economy
0 stars 0 forks source link

[CWM] Add vocabs API route to the ts-rest contract #19

Closed wu-lee closed 1 month ago

wu-lee commented 1 month ago

Description

The mykomap client will need to get information about pin attribute categories. Specifically:

This information was provided by the vocabs.json format in older versions of Mykomap. It might be convenient to retain that format/structure here.

Examples of this can be visible on the data server. Some specific examples:

A cut-down example from the Mykomap documentation follows. This defines one vocabulary defining standard size definitions. The SKOS vocab URI for this is https://example.com/sizes/1.1/. An abbreviation prefix is defined for use in this context, sz. Within the rest of the JSON document, a sz: prefix represents that URI, and the URI for a term can be reconstructed by replacing the prefix and the colon with that URI.

Languages are identified by a two-character (upper case) code as defined by ISO 639 Set 1. There is only one language section here, for EN, with term labels in English. But there could be as many as required.

Both the prefixes and the language codes should be unique within this file. The same set of languages should be present for all vocabs. (When they are not, this might be an error, or it might be resolved by rules to fall-back to some other language, but that's outside the scope of this proposal.)

{
  "prefixes": {
    "https://example.com/sizes/1.1/": "sz"
  },
  "vocabs": {
    "sz:": {
      "EN": {
        "title": "Sizes",
        "terms": {
          "sz:large": "Large",
          "sz:medium": "Medium",
          "sz:small": "Small"
        }
      }
    }
  }
}

Use as language files

The older version of Mykomap was also starting to use these vocabs for language translations, which are functionally very similar.

Not all translations can (or should) be hardcoded into the application, either the old or the new one:

See the ui: vocab in the second example above for an example of a vocab used to define the labels for UI text not built into the Mykomap application.

Linked data interoperability

These vocab files were designed to be interoperable with linked-data SKOS vocabularies (hence the name "vocab" rather than "taxonomy" or "category")

They don't include everything a SKOS vocab can, just a subset. But a SKOS vocab can be used to create one of these JSON vocab files, and SKOS vocab can be constructed from a vocab file when some supplemental data is added.

There are scripts for converting these formats in the map-sse repository:

API Proposal

The endpoint could look like this (in sketch form):

A successful response should returned JSON data in the body, and this should essentially look like the vocab JSON format examples above.

(A more formal proposal expressed as a ts-rest contract should be added as a preliminary implementation step.)

Caveats:

Back end

Note: at the time of writing the back-end is not fully implemented, instead it just regurgitates canned data files.

If that is still the case when this issue is implemented, use canned-file regurgitation as before. A full implementation will probably then need to be added by the relevant issue for implementing the back-end.

Acceptance Criteria

rogup commented 1 month ago

@wu-lee In the back-end design, the vocabs.json lives in the folder for a dataset, since it needs to match up with the fields in that dataset (and from my understanding is generated by the data factory for a specific dataset CSV)

So, I'm wondering why we would need to get one of multiple vocabs individually by their IDs, rather than just using a URL like https://example.com/api/:datasetId/vocab ?


Also, another thing I was thinking was that we still don't have a way of specifying which field to use in the 'directory' panel. Maybe we could specify this in the vocabs.json e.g. something like a primary_category in the meta section? Or maybe it makes more sense for it to live in the map's UI config, along with config such as which fields are filterable.

wu-lee commented 1 month ago

So, I'm wondering why we would need to get one of multiple vocabs individually by their IDs, rather than just using a URL like

Yes, I was imagining the front end would know the vocabs and languages to ask for from elsewhere, but you're right, perhaps it would only know the dataset ID.

Note re. "one of" - we will potentially need multiple vocabs (plural) for any given case (e.g. countries, activities, etc.).

Although presumably the client would know the languages it needs. Beyond limitations set by the languages available, the choice would I think depend on the client rather than the dataset - perhaps inferred from the browser's locale or some other parameter. And in most cases just one language - we could require a map reload to get a language switch.

So maybe we can change the multiple vocab IDs to a single dataset ID and put this endpoint under the dataset route. e.g.

https://example.com/api/dataset/:datasetId/vocab/:langId

I wonder if there might also need to be a way to get the vocabs by ID, but probably we could add that in a separate endpoint.

wu-lee commented 1 month ago

Also, another thing I was thinking was that we still don't have a way of specifying which field to use in the 'directory' panel.

True. Perhaps instead of purely a vocabs API endpoint, we need a "config" endpoint, which includes the vocabs in one attribute, and any other options. So

https://example.com/api/dataset/:datasetId/config/:langId

This broadens the scope of this issue somewhat. We could start with just a the vocabs though and add more things as we go?

rogup commented 1 month ago

So, I'm wondering why we would need to get one of multiple vocabs individually by their IDs, rather than just using a URL like

Yes, I was imagining the front end would know the vocabs and languages to ask for from elsewhere, but you're right, perhaps it would only know the dataset ID.

Note re. "one of" - we will potentially need multiple vocabs (plural) for any given case (e.g. countries, activities, etc.).

Although presumably the client would know the languages it needs. Beyond limitations set by the languages available, the choice would I think depend on the client rather than the dataset - perhaps inferred from the browser's locale or some other parameter. And in most cases just one language - we could require a map reload to get a language switch.

So maybe we can change the multiple vocab IDs to a single dataset ID and put this endpoint under the dataset route. e.g.

https://example.com/api/dataset/:datasetId/vocab/:langId

I wonder if there might also need to be a way to get the vocabs by ID, but probably we could add that in a separate endpoint.

If we're wanting to keep the logic simple (especially on the back-end so that we can just serve a static file), do we need the langId parameter at all? I can't see the harm in sending the whole file with all the languages, unless the file is very big, and then the front-end logic would be simpler too, rather than needing a reload if the language changes.

rogup commented 1 month ago

This broadens the scope of this issue somewhat. We could start with just a the vocabs though and add more things as we go?

Agreed, we can just hardcode some of it for now, until we work out the method of storing and retrieving config.

Long-term, I like the idea of having a https://example.com/api/dataset/:datasetId/config API. I think the config file could live in source control in something like a map-variants library, so maybe we would want to keep the vocabs separate.

We could potentially have just 1 back-end server and 1 build of the front-end, which is used by all maps. Which is customised by the URL parameters in order to dynamically load a specific set of data and UI config. This would be a lot simpler than having multiple builds of the front-end with built in UI layouts. But maybe it's limited if we need more customisation such as styling and images, which may be difficult to dynamically load in. This is all out of scope though

wu-lee commented 1 month ago

So, I'm wondering why we would need to get one of multiple vocabs individually by their IDs, rather than just using a URL like

Yes, I was imagining the front end would know the vocabs and languages to ask for from elsewhere, but you're right, perhaps it would only know the dataset ID. Note re. "one of" - we will potentially need multiple vocabs (plural) for any given case (e.g. countries, activities, etc.). Although presumably the client would know the languages it needs. Beyond limitations set by the languages available, the choice would I think depend on the client rather than the dataset - perhaps inferred from the browser's locale or some other parameter. And in most cases just one language - we could require a map reload to get a language switch. So maybe we can change the multiple vocab IDs to a single dataset ID and put this endpoint under the dataset route. e.g.

https://example.com/api/dataset/:datasetId/vocab/:langId

I wonder if there might also need to be a way to get the vocabs by ID, but probably we could add that in a separate endpoint.

If we're wanting to keep the logic simple (especially on the back-end so that we can just serve a static file), do we need the langId parameter at all? I can't see the harm in sending the whole file with all the languages, unless the file is very big, and then the front-end logic would be simpler too, rather than needing a reload if the language changes.

I guess we could cover all the bases by saying:

wu-lee commented 1 month ago

This broadens the scope of this issue somewhat. We could start with just a the vocabs though and add more things as we go?

Agreed, we can just hardcode some of it for now, until we work out the method of storing and retrieving config.

Potential misinterpretation here as I was actually meaning we could switch this to a config endpoint, but that it would omit any config options besides the vocabs. So, probably the same schema gets sent back as above (unless we prefer to nest that under an attribute, but then vocabs.vocabs.<whatever> seems to a potentially awkward consequence). And more attributes can get added later.

Long-term, I like the idea of having a https://example.com/api/dataset/:datasetId/config API. I think the config file could live in source control in something like a map-variants library, so maybe we would want to keep the vocabs separate.

In early implementations, we don't need to keep vocabs separate, cos there's only one case really, and even if not we don't really care much about duplication. In later implementations, any recombination could be done dynamically.

We could potentially have just 1 back-end server and 1 build of the front-end, which is used by all maps. Which is customised by the URL parameters in order to dynamically load a specific set of data and UI config. This would be a lot simpler than having multiple builds of the front-end with built in UI layouts.

Yes. I think the config would be implied by the datasetId, initially and/or by default. Later we could add a way to override the default config for a dataset?

But maybe it's limited if we need more customisation such as styling and images, which may be difficult to dynamically load in. This is all out of scope though

Yes

rogup commented 1 month ago

@wu-lee So are you suggesting that

Also a tangential question: do vocabs ever contain actual data? e.g. if the language is set to Hindi, where do the translations of the names and descriptions live, or do we not bother translating these? If they do contain translated data, config probably isn't the correct name for it

wu-lee commented 1 month ago

@wu-lee So are you suggesting that

  • the vocabs.json lives in the dataset folder, alongside the other data files for a dataset Maybe, this is one option - but I was thinking now it could just be stored in a verbatim config file.

  • we have a config API endpoint, and no vocabs endpoint Yes. Sent verbatim. Only parameter being datasetId for now. Later language, if only one is wanted.

Later, some combination of vocabs into the config could be done to avoid duplication when shared between datasets. But this is not entirely necessary now, if we wanted we could do that in the script which generates the stored config.json

* For this API route, the back-end pulls together the vocabs.json and any other dataset config in source control as one JSON object and sends this. But for now, the only bit of config it sends will be the vocabs.

This combination not necessary in the current version of the proposal. After all, this would will be repeating the same computation every request, so we may as well just do it once.

Also a tangential question: do vocabs ever contain actual data? e.g. if the language is set to Hindi, where do the translations of the names and descriptions live, or do we not bother translating these? If they do contain translated data, config probably isn't the correct name for it

The vocabs can contain whatever you want, in one sense. But I wouldn't consider them data, as they are indexes for interpreting the IDs, not instances of the IDs related to some organisation.

The translations are interpretations of IDs and live in the vocab. In this proposal these are stored in the config. Maybe config isn't the right name, but I'm not sure what is better?

rogup commented 1 month ago

Cool, thanks for the clarifications, this all makes sense to me now!

rogup commented 1 month ago

@wu-lee When you make these changes, I'm thinking we should also change the getDataset route to getLocations since it only returns the locations

wu-lee commented 1 month ago

@rogup I've got a contract with some tests for the validations in the branch now.

When you get a moment could you have a look and see if it looks sensible? This then satisfies the first and second checkboxes.

wu-lee commented 1 month ago

After our earlier discussion I've added

rogup commented 1 month ago

@wu-lee I've reviewed the contract and it looks fine, just a clarifying question as a comment in the above PR

wu-lee commented 1 month ago

@rogup - I've revised the branch, and so it probably needs another quick look.

Mentioned this to Rohit elsewhere, but for context here, it wasn't finished, so wasn't intended to be merged yet. I've finished it now, adding tests for validating ConfigData and integrating some fixups and other minor changes.

I would like to re-merge it in the right place. Unfortunately there has been a commit to main since then. Fortunately I do think I can undo this, but

Since @rohit is away today, prior to actually doing that I've made a temporary new branch new-main which is my proposed change to main, and rebased the branch onto that. @ms0ur1s - just to be aware of this: I don't think your about to commit anything else to main but let me know if you want to.

wu-lee commented 1 month ago

@rogup - having done the above, just checking the tests, I've found they fail because of the changes to the file structure in apps/back-end/test/data/. Just attempting to add some fixes to that.

This is not in scope for this branch, but because I'm not sure where to put it, one question I have is the name "initiatives" - which is used to name one of the subdirectories. Do we want to stick with that, which is a throwback to early SEA days? I think it meant "an initiative in the Solidarity Economy" - in our case it might be safer just to call them "pins" or perhaps "organisations" (although I wonder if they'll always be organisations on our future maps).

wu-lee commented 1 month ago

For now I've just partially reverted the commit deleting the directory test-A used by the tests - fixing the tests for the data in dataset-A requires implementing stuff which is in the scope of another branch. We can re-delete or otherwise deal with this later - mainly for now I need to check everything is otherwise shipshape.

wu-lee commented 1 month ago

Front end tests fail too, looks like they trigger something which uses the browser API, so fails in Node. This seems to have been introduced upstream, there's the same problem on the current main branch.

src/App.test.tsx [ src/App.test.tsx ]
TypeError: window.URL.createObjectURL is not a function
 ❯ define ../../node_modules/maplibre-gl/dist/maplibre-gl.js:34:44
     32| 
     33|     if (typeof window !== 'undefined') {
     34|         maplibregl.setWorkerUrl(window.URL.createObjectURL(new Blob([workerBundleString], { …
       |                                            ^
     35|     }
     36| 

However the app as a whole seems to run ok and I can see the test data on the map running front end with npm run start and navigating to http://localhost:3000/?datasetId=test-A

Which means I think this branch could be merged in, modulo workarounds?

rogup commented 1 month ago

@rogup - having done the above, just checking the tests, I've found they fail because of the changes to the file structure in apps/back-end/test/data/. Just attempting to add some fixes to that.

This is not in scope for this branch, but because I'm not sure where to put it, one question I have is the name "initiatives" - which is used to name one of the subdirectories. Do we want to stick with that, which is a throwback to early SEA days? I think it meant "an initiative in the Solidarity Economy" - in our case it might be safer just to call them "pins" or perhaps "organisations" (although I wonder if they'll always be organisations on our future maps).

I quite like that it's a unique word so clear what it means to us. I vote to just keep it the same for now since it works, and we can always change it later

rogup commented 1 month ago

@rogup - I've revised the branch, and so it probably needs another quick look.

Mentioned this to Rohit elsewhere, but for context here, it wasn't finished, so wasn't intended to be merged yet. I've finished it now, adding tests for validating ConfigData and integrating some fixups and other minor changes.

I would like to re-merge it in the right place. Unfortunately there has been a commit to main since then. Fortunately I do think I can undo this, but

  • it needs a force-push to main
  • which needs a (temporary) removal of the branch protection on main

Since @Rohit is away today, prior to actually doing that I've made a temporary new branch new-main which is my proposed change to main, and rebased the branch onto that. @ms0ur1s - just to be aware of this: I don't think your about to commit anything else to main but let me know if you want to.

I'm not sure what I should be reviewing sorry, I find rebases confusing. I can see there is a merge commit on new-main containing random changes that me and Marcel made in different commits, but they don't relate to this issue. Are you wanting to force push new-main to main? As you know, I'm more of a fan of just merging and squash merging, since there's less risk of things getting lost, and I don't care that much if commits get squashed together or not in the perfect order - I usually look at the PR for details about a past change rather than looking at each commit message

wu-lee commented 1 month ago

Ok, let me know if I should go ahead, explain, or something else.

Yes, I am proposing to force push new-main as main. I'm pretty confident this does what we want, but it'll be easier to show you rather than try and explain in words.

wu-lee commented 1 month ago

I quite like that it's a unique word so clear what it means to us. I vote to just keep it the same for now since it works, and we can always change it later

I'm not insisting, as you're right, it is just a name.

However, renaming it now is (I think) trivial, renaming it later won't be. So the same "now or probably never" principle applies to renaming getDataset to getLocations.

If you were looking for a good alternative name I'd call them "points" like GeoJSON does, or if we want to bake in specificity to our current application's scope, "organisations".

wu-lee commented 1 month ago

On reflection, not sure if my last comment is adding much. Instead, maybe we should ask for some more input from the rest of the team and see what they think.

wu-lee commented 1 month ago

Ok, let me know if I should go ahead, explain, or something else.

Yes, I am proposing to force push new-main as main. I'm pretty confident this does what we want, but it'll be easier to show you rather than try and explain in words.

I think you're away for the next few days, so to allow some progress one way or the other I'm going to try and do the force push, leaving the current position of main recorded with the branch main-2024-10-23, so it can be recovered if necessary. This will also undo the accidental merge of #23.

To update your local main branch, this should do the trick

# Update your working directory and checkout your main branch
git fetch
git checkout main

# Make sure you've no local unpushed commits on your main vs the old position:
git merge --ff-only origin/main-2024-10-23 # if this fails best to speak to me

# move the current branch to origin/main, and updates the working directory
git reset --hard origin/main 

@ms0ur1s - 272-results-panel is branched off the old main, but it can be rebased onto the new one with:

git rebase main-2024-10-23 272-results-panel  --onto main

# check tests all still run etc.

# push your changes upstsream
git push --force-with-lease origin 272-results-panel

No other branches currently included in the GitHub repository are affected.

wu-lee commented 1 month ago

Ok, having coordinated with @rogup #32 is now merged. The tests which count pass, but there's one which doesn't because of a change upstream of PR #32


 FAIL  src/App.test.tsx [ src/App.test.tsx ]
TypeError: window.URL.createObjectURL is not a function
 ❯ define ../../node_modules/maplibre-gl/dist/maplibre-gl.js:34:44
     32| 
     33|     if (typeof window !== 'undefined') {
     34|         maplibregl.setWorkerUrl(window.URL.createObjectURL(new Blob([workerBundleString], { t…
       |                                            ^
     35|     }
     36| 
 ❯ ../../node_modules/maplibre-gl/dist/maplibre-gl.js:46:1
 ❯ ../../node_modules/maplibre-gl/dist/maplibre-gl.js:6:81
 ❯ ../../node_modules/maplibre-gl/dist/maplibre-gl.js:9:3
 ❯ src/components/map/mapLibre.ts:1:1

⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯

Fixing this is outside the scope of this issue, as it is due to unrelated code. Implementing the back-end fully is also out of scope, I think this is the job of #20

ms0ur1s commented 1 month ago

@ms0ur1s - 272-results-panel is branched off the old main, but it can be rebased onto the new one with:

git rebase main-2024-10-23 272-results-panel  --onto main

# check tests all still run etc.

# push your changes upstsream
git push --force-with-lease origin 272-results-panel

No other branches currently included in the GitHub repository are affected.

Nice one @wu-lee, thanks for that 😁