Closed wu-lee closed 1 month ago
@wu-lee In the back-end design, the vocabs.json lives in the folder for a dataset, since it needs to match up with the fields in that dataset (and from my understanding is generated by the data factory for a specific dataset CSV)
So, I'm wondering why we would need to get one of multiple vocabs individually by their IDs, rather than just using a URL like
https://example.com/api/:datasetId/vocab
?
Also, another thing I was thinking was that we still don't have a way of specifying which field to use in the 'directory' panel. Maybe we could specify this in the vocabs.json e.g. something like a primary_category
in the meta
section? Or maybe it makes more sense for it to live in the map's UI config, along with config such as which fields are filterable.
So, I'm wondering why we would need to get one of multiple vocabs individually by their IDs, rather than just using a URL like
Yes, I was imagining the front end would know the vocabs and languages to ask for from elsewhere, but you're right, perhaps it would only know the dataset ID.
Note re. "one of" - we will potentially need multiple vocabs (plural) for any given case (e.g. countries, activities, etc.).
Although presumably the client would know the languages it needs. Beyond limitations set by the languages available, the choice would I think depend on the client rather than the dataset - perhaps inferred from the browser's locale or some other parameter. And in most cases just one language - we could require a map reload to get a language switch.
So maybe we can change the multiple vocab IDs to a single dataset ID and put this endpoint under the dataset route. e.g.
https://example.com/api/dataset/:datasetId/vocab/:langId
I wonder if there might also need to be a way to get the vocabs by ID, but probably we could add that in a separate endpoint.
Also, another thing I was thinking was that we still don't have a way of specifying which field to use in the 'directory' panel.
True. Perhaps instead of purely a vocabs API endpoint, we need a "config" endpoint, which includes the vocabs in one attribute, and any other options. So
https://example.com/api/dataset/:datasetId/config/:langId
This broadens the scope of this issue somewhat. We could start with just a the vocabs though and add more things as we go?
So, I'm wondering why we would need to get one of multiple vocabs individually by their IDs, rather than just using a URL like
Yes, I was imagining the front end would know the vocabs and languages to ask for from elsewhere, but you're right, perhaps it would only know the dataset ID.
Note re. "one of" - we will potentially need multiple vocabs (plural) for any given case (e.g. countries, activities, etc.).
Although presumably the client would know the languages it needs. Beyond limitations set by the languages available, the choice would I think depend on the client rather than the dataset - perhaps inferred from the browser's locale or some other parameter. And in most cases just one language - we could require a map reload to get a language switch.
So maybe we can change the multiple vocab IDs to a single dataset ID and put this endpoint under the dataset route. e.g.
https://example.com/api/dataset/:datasetId/vocab/:langId
I wonder if there might also need to be a way to get the vocabs by ID, but probably we could add that in a separate endpoint.
If we're wanting to keep the logic simple (especially on the back-end so that we can just serve a static file), do we need the langId
parameter at all? I can't see the harm in sending the whole file with all the languages, unless the file is very big, and then the front-end logic would be simpler too, rather than needing a reload if the language changes.
This broadens the scope of this issue somewhat. We could start with just a the vocabs though and add more things as we go?
Agreed, we can just hardcode some of it for now, until we work out the method of storing and retrieving config.
Long-term, I like the idea of having a https://example.com/api/dataset/:datasetId/config
API. I think the config file could live in source control in something like a map-variants
library, so maybe we would want to keep the vocabs separate.
We could potentially have just 1 back-end server and 1 build of the front-end, which is used by all maps. Which is customised by the URL parameters in order to dynamically load a specific set of data and UI config. This would be a lot simpler than having multiple builds of the front-end with built in UI layouts. But maybe it's limited if we need more customisation such as styling and images, which may be difficult to dynamically load in. This is all out of scope though
So, I'm wondering why we would need to get one of multiple vocabs individually by their IDs, rather than just using a URL like
Yes, I was imagining the front end would know the vocabs and languages to ask for from elsewhere, but you're right, perhaps it would only know the dataset ID. Note re. "one of" - we will potentially need multiple vocabs (plural) for any given case (e.g. countries, activities, etc.). Although presumably the client would know the languages it needs. Beyond limitations set by the languages available, the choice would I think depend on the client rather than the dataset - perhaps inferred from the browser's locale or some other parameter. And in most cases just one language - we could require a map reload to get a language switch. So maybe we can change the multiple vocab IDs to a single dataset ID and put this endpoint under the dataset route. e.g.
https://example.com/api/dataset/:datasetId/vocab/:langId
I wonder if there might also need to be a way to get the vocabs by ID, but probably we could add that in a separate endpoint.
If we're wanting to keep the logic simple (especially on the back-end so that we can just serve a static file), do we need the
langId
parameter at all? I can't see the harm in sending the whole file with all the languages, unless the file is very big, and then the front-end logic would be simpler too, rather than needing a reload if the language changes.
I guess we could cover all the bases by saying:
This broadens the scope of this issue somewhat. We could start with just a the vocabs though and add more things as we go?
Agreed, we can just hardcode some of it for now, until we work out the method of storing and retrieving config.
Potential misinterpretation here as I was actually meaning we could switch this to a config
endpoint, but that it would omit any config options besides the vocabs. So, probably the same schema gets sent back as above (unless we prefer to nest that under an attribute, but then vocabs.vocabs.<whatever>
seems to a potentially awkward consequence). And more attributes can get added later.
Long-term, I like the idea of having a
https://example.com/api/dataset/:datasetId/config
API. I think the config file could live in source control in something like amap-variants
library, so maybe we would want to keep the vocabs separate.
In early implementations, we don't need to keep vocabs separate, cos there's only one case really, and even if not we don't really care much about duplication. In later implementations, any recombination could be done dynamically.
We could potentially have just 1 back-end server and 1 build of the front-end, which is used by all maps. Which is customised by the URL parameters in order to dynamically load a specific set of data and UI config. This would be a lot simpler than having multiple builds of the front-end with built in UI layouts.
Yes. I think the config would be implied by the datasetId, initially and/or by default. Later we could add a way to override the default config for a dataset?
But maybe it's limited if we need more customisation such as styling and images, which may be difficult to dynamically load in. This is all out of scope though
Yes
@wu-lee So are you suggesting that
config
API endpoint, and no vocabs
endpointAlso a tangential question: do vocabs ever contain actual data? e.g. if the language is set to Hindi, where do the translations of the names and descriptions live, or do we not bother translating these? If they do contain translated data, config
probably isn't the correct name for it
@wu-lee So are you suggesting that
the vocabs.json lives in the dataset folder, alongside the other data files for a dataset Maybe, this is one option - but I was thinking now it could just be stored in a verbatim config file.
we have a
config
API endpoint, and novocabs
endpoint Yes. Sent verbatim. Only parameter being datasetId for now. Later language, if only one is wanted.
Later, some combination of vocabs into the config could be done to avoid duplication when shared between datasets. But this is not entirely necessary now, if we wanted we could do that in the script which generates the stored config.json
* For this API route, the back-end pulls together the vocabs.json and any other dataset config in source control as one JSON object and sends this. But for now, the only bit of config it sends will be the vocabs.
This combination not necessary in the current version of the proposal. After all, this would will be repeating the same computation every request, so we may as well just do it once.
Also a tangential question: do vocabs ever contain actual data? e.g. if the language is set to Hindi, where do the translations of the names and descriptions live, or do we not bother translating these? If they do contain translated data,
config
probably isn't the correct name for it
The vocabs can contain whatever you want, in one sense. But I wouldn't consider them data, as they are indexes for interpreting the IDs, not instances of the IDs related to some organisation.
The translations are interpretations of IDs and live in the vocab. In this proposal these are stored in the config. Maybe config isn't the right name, but I'm not sure what is better?
Cool, thanks for the clarifications, this all makes sense to me now!
@wu-lee When you make these changes, I'm thinking we should also change the getDataset
route to getLocations
since it only returns the locations
@rogup I've got a contract with some tests for the validations in the branch now.
When you get a moment could you have a look and see if it looks sensible? This then satisfies the first and second checkboxes.
After our earlier discussion I've added
@wu-lee I've reviewed the contract and it looks fine, just a clarifying question as a comment in the above PR
@rogup - I've revised the branch, and so it probably needs another quick look.
Mentioned this to Rohit elsewhere, but for context here, it wasn't finished, so wasn't intended to be merged yet. I've finished it now, adding tests for validating ConfigData and integrating some fixups and other minor changes.
I would like to re-merge it in the right place. Unfortunately there has been a commit to main since then. Fortunately I do think I can undo this, but
Since @rohit is away today, prior to actually doing that I've made a temporary new branch new-main
which is my proposed change to main
, and rebased the branch onto that. @ms0ur1s - just to be aware of this: I don't think your about to commit anything else to main but let me know if you want to.
@rogup - having done the above, just checking the tests, I've found they fail because of the changes to the file structure in apps/back-end/test/data/
. Just attempting to add some fixes to that.
This is not in scope for this branch, but because I'm not sure where to put it, one question I have is the name "initiatives" - which is used to name one of the subdirectories. Do we want to stick with that, which is a throwback to early SEA days? I think it meant "an initiative in the Solidarity Economy" - in our case it might be safer just to call them "pins" or perhaps "organisations" (although I wonder if they'll always be organisations on our future maps).
For now I've just partially reverted the commit deleting the directory test-A
used by the tests - fixing the tests for the data in dataset-A
requires implementing stuff which is in the scope of another branch. We can re-delete or otherwise deal with this later - mainly for now I need to check everything is otherwise shipshape.
Front end tests fail too, looks like they trigger something which uses the browser API, so fails in Node. This seems to have been introduced upstream, there's the same problem on the current main branch.
src/App.test.tsx [ src/App.test.tsx ]
TypeError: window.URL.createObjectURL is not a function
❯ define ../../node_modules/maplibre-gl/dist/maplibre-gl.js:34:44
32|
33| if (typeof window !== 'undefined') {
34| maplibregl.setWorkerUrl(window.URL.createObjectURL(new Blob([workerBundleString], { …
| ^
35| }
36|
However the app as a whole seems to run ok and I can see the test data on the map running front end with npm run start
and navigating to http://localhost:3000/?datasetId=test-A
Which means I think this branch could be merged in, modulo workarounds?
@rogup - having done the above, just checking the tests, I've found they fail because of the changes to the file structure in
apps/back-end/test/data/
. Just attempting to add some fixes to that.This is not in scope for this branch, but because I'm not sure where to put it, one question I have is the name "initiatives" - which is used to name one of the subdirectories. Do we want to stick with that, which is a throwback to early SEA days? I think it meant "an initiative in the Solidarity Economy" - in our case it might be safer just to call them "pins" or perhaps "organisations" (although I wonder if they'll always be organisations on our future maps).
I quite like that it's a unique word so clear what it means to us. I vote to just keep it the same for now since it works, and we can always change it later
@rogup - I've revised the branch, and so it probably needs another quick look.
Mentioned this to Rohit elsewhere, but for context here, it wasn't finished, so wasn't intended to be merged yet. I've finished it now, adding tests for validating ConfigData and integrating some fixups and other minor changes.
I would like to re-merge it in the right place. Unfortunately there has been a commit to main since then. Fortunately I do think I can undo this, but
- it needs a force-push to main
- which needs a (temporary) removal of the branch protection on main
Since @Rohit is away today, prior to actually doing that I've made a temporary new branch
new-main
which is my proposed change tomain
, and rebased the branch onto that. @ms0ur1s - just to be aware of this: I don't think your about to commit anything else to main but let me know if you want to.
I'm not sure what I should be reviewing sorry, I find rebases confusing. I can see there is a merge commit on new-main
containing random changes that me and Marcel made in different commits, but they don't relate to this issue. Are you wanting to force push new-main
to main
?
As you know, I'm more of a fan of just merging and squash merging, since there's less risk of things getting lost, and I don't care that much if commits get squashed together or not in the perfect order - I usually look at the PR for details about a past change rather than looking at each commit message
Ok, let me know if I should go ahead, explain, or something else.
Yes, I am proposing to force push new-main as main. I'm pretty confident this does what we want, but it'll be easier to show you rather than try and explain in words.
I quite like that it's a unique word so clear what it means to us. I vote to just keep it the same for now since it works, and we can always change it later
I'm not insisting, as you're right, it is just a name.
However, renaming it now is (I think) trivial, renaming it later won't be. So the same "now or probably never" principle applies to renaming getDataset
to getLocations
.
If you were looking for a good alternative name I'd call them "points" like GeoJSON does, or if we want to bake in specificity to our current application's scope, "organisations".
On reflection, not sure if my last comment is adding much. Instead, maybe we should ask for some more input from the rest of the team and see what they think.
Ok, let me know if I should go ahead, explain, or something else.
Yes, I am proposing to force push new-main as main. I'm pretty confident this does what we want, but it'll be easier to show you rather than try and explain in words.
I think you're away for the next few days, so to allow some progress one way or the other I'm going to try and do the force push, leaving the current position of main
recorded with the branch main-2024-10-23
, so it can be recovered if necessary. This will also undo the accidental merge of #23.
To update your local main branch, this should do the trick
# Update your working directory and checkout your main branch
git fetch
git checkout main
# Make sure you've no local unpushed commits on your main vs the old position:
git merge --ff-only origin/main-2024-10-23 # if this fails best to speak to me
# move the current branch to origin/main, and updates the working directory
git reset --hard origin/main
@ms0ur1s - 272-results-panel
is branched off the old main, but it can be rebased onto the new one with:
git rebase main-2024-10-23 272-results-panel --onto main
# check tests all still run etc.
# push your changes upstsream
git push --force-with-lease origin 272-results-panel
No other branches currently included in the GitHub repository are affected.
Ok, having coordinated with @rogup #32 is now merged. The tests which count pass, but there's one which doesn't because of a change upstream of PR #32
FAIL src/App.test.tsx [ src/App.test.tsx ]
TypeError: window.URL.createObjectURL is not a function
❯ define ../../node_modules/maplibre-gl/dist/maplibre-gl.js:34:44
32|
33| if (typeof window !== 'undefined') {
34| maplibregl.setWorkerUrl(window.URL.createObjectURL(new Blob([workerBundleString], { t…
| ^
35| }
36|
❯ ../../node_modules/maplibre-gl/dist/maplibre-gl.js:46:1
❯ ../../node_modules/maplibre-gl/dist/maplibre-gl.js:6:81
❯ ../../node_modules/maplibre-gl/dist/maplibre-gl.js:9:3
❯ src/components/map/mapLibre.ts:1:1
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Fixing this is outside the scope of this issue, as it is due to unrelated code. Implementing the back-end fully is also out of scope, I think this is the job of #20
@ms0ur1s -
272-results-panel
is branched off the old main, but it can be rebased onto the new one with:git rebase main-2024-10-23 272-results-panel --onto main # check tests all still run etc. # push your changes upstsream git push --force-with-lease origin 272-results-panel
No other branches currently included in the GitHub repository are affected.
Nice one @wu-lee, thanks for that 😁
Description
The mykomap client will need to get information about pin attribute categories. Specifically:
This information was provided by the
vocabs.json
format in older versions of Mykomap. It might be convenient to retain that format/structure here.Examples of this can be visible on the data server. Some specific examples:
A cut-down example from the Mykomap documentation follows. This defines one vocabulary defining standard size definitions. The SKOS vocab URI for this is
https://example.com/sizes/1.1/
. An abbreviation prefix is defined for use in this context,sz
. Within the rest of the JSON document, asz:
prefix represents that URI, and the URI for a term can be reconstructed by replacing the prefix and the colon with that URI.Languages are identified by a two-character (upper case) code as defined by ISO 639 Set 1. There is only one language section here, for
EN
, with term labels in English. But there could be as many as required.Both the prefixes and the language codes should be unique within this file. The same set of languages should be present for all vocabs. (When they are not, this might be an error, or it might be resolved by rules to fall-back to some other language, but that's outside the scope of this proposal.)
Use as language files
The older version of Mykomap was also starting to use these vocabs for language translations, which are functionally very similar.
Not all translations can (or should) be hardcoded into the application, either the old or the new one:
See the
ui:
vocab in the second example above for an example of a vocab used to define the labels for UI text not built into the Mykomap application.Linked data interoperability
These vocab files were designed to be interoperable with linked-data SKOS vocabularies (hence the name "vocab" rather than "taxonomy" or "category")
They don't include everything a SKOS vocab can, just a subset. But a SKOS vocab can be used to create one of these JSON vocab files, and SKOS vocab can be constructed from a vocab file when some supplemental data is added.
There are scripts for converting these formats in the
map-sse
repository:API Proposal
The endpoint could look like this (in sketch form):
https://example.com/api/vocab/:vocabId
- get a single vocab by ID, with all supported languages. Invalid vocab IDs should be an error.https://example.com/api/vocab/:vocabId1;vocabId2
- get multiple vocabs, likewise. Invalid vocab IDs should be an error.https://example.com/api/vocab/:vocabId1/:languageId1;:languageId2
- get a vocab by ID, but just certain languages. Missing vocabs an error as before, but missing languages might not be an error? In which case the application should be smart and substitute a "default" language. This is what the old Mykomap app does with missing languages.A successful response should returned JSON data in the body, and this should essentially look like the vocab JSON format examples above.
(A more formal proposal expressed as a ts-rest contract should be added as a preliminary implementation step.)
Caveats:
sz
should be the key instead ofsz:
sz:
is repeated in all the terms, but could be inferred by the fact they are within thesz:
vocab section. If implemented, both this and the previous point would, make the vobab schema diverge from the old format. However, it should be possible to make the old (or new) parser tolerant of the differences. In which case, this might be an opportunity for incremental improvements.meta
where it is otherwise ignored.meta
attribute is specifically meant for diagnosing problems with the old Mykomap vocab proxy service, which talks to a SPARQL endpoint and generates a vocab from that. It's probably no longer really useful.prefixes
attribute might not strictly be needed in our case here. But saying that it is necessary for conversion of the QNames back to URIs, needed for linked-data compatibility, so perhaps it is worth preserving.Back end
Note: at the time of writing the back-end is not fully implemented, instead it just regurgitates canned data files.
If that is still the case when this issue is implemented, use canned-file regurgitation as before. A full implementation will probably then need to be added by the relevant issue for implementing the back-end.
Acceptance Criteria