Depart-de-Sentier / brightcon-2024-material

Talks and presentation materials from the Brightcon 2024 conference and hackathon
3 stars 5 forks source link

Semantic unit conversion app using Sentier.dev platform #1

Open cmutel opened 2 days ago

cmutel commented 2 days ago

Overview

We have a database with units, code for a specific type of unit conversion, and the need for a general API for converting units in the future.

User stories

Let's build a webapp that can do the following:

And finally, tie all this together so you can start typing a unit, pick the right one from a dropdown, and then get tables of conversion factors for each system we include.

Unit systems

The unit systems we already have:

Unit systems I would like to have:

Tasks

Stretch goals:

Skosmos search

This is possibly hard. We have a search index via skosmos (which should also have an API), but it only searches on prefLabel (see search result for btu versus british), and maybe on altLabel. We are currently using notation ("Notations are symbols which are not normally recognizable as words or sequences of words in any natural language and are thus usable independently of natural-language contexts"), but we could change these to altLabel, or add altLabel in addition to notation (there are strings, even if they have custom data types, so should be fine for being instances of RDF plain literal).

cmutel commented 1 day ago

Preliminary plan is to develop a new UI and API using React and FastAPI, and to have our own search index using something like ElasticSearch. The reason we chose not to build on Skosmos is that we can move more quickly by building a more targeted user experience with specific and complicated Sparql queries, and that we want people to think about building apps on top of our data products (this is a good example).

We have three API endpoints in mind:

cmutel commented 1 day ago

Hackathon team:

cmutel commented 16 hours ago

Quick update from my side: We have an initial unit endpoint available (PR), and this pulls all data for all units of the same quantity kind as the input unit.

The output is a JSON Map with keys of unit IRIS and values of lists of (attribute, value). This needs to be a list because the same attribute can be present more than once. Here is an example:

{
    "https://vocab.sentier.dev/qudt/unit/M-SEC": [
        [
            "type",
            "Concept"
        ],
        [
            "prefLabel",
            "Metre second"
        ],
        [
            "prefLabel",
            "Meter second"
        ],
        [
            "notation",
            "ms"
        ],
        [
            "notation",
            "m.s"
        ],
        [
            "inScheme",
            "https://vocab.sentier.dev/qudt/"
        ],
        [
            "broader",
            "https://vocab.sentier.dev/qudt/quantity-kind/LengthTime"
        ],
        [
            "narrower",
            "https://vocab.sentier.dev/qudt/unit/M-YR"
        ],
        [
            "definition",
            "Meter over one second"
        ],
        [
            "broaderTransitive",
            "https://vocab.sentier.dev/qudt/quantity-kind/LengthTime"
        ],
        [
            "narrowerTransitive",
            "https://vocab.sentier.dev/qudt/unit/M-YR"
        ],
        [
            "hasDimensionVector",
            "http://qudt.org/vocab/dimensionvector/A0E0L1I0M0H0T1D0"
        ],
        [
            "applicableSystem",
            "http://qudt.org/vocab/sou/SI"
        ],
        [
            "applicableSystem",
            "http://qudt.org/vocab/sou/CGS"
        ],
        [
            "conversionMultiplier",
            "1.0"
        ],
        [
            "conversionMultiplierSN",
            "1.0e0"
        ],
        [
            "hasQuantityKind",
            "https://vocab.sentier.dev/qudt/quantity-kind/LengthTime"
        ]
    ]
}

We could imagine having two tables, one with real units (anything not IMPERIAL, PLANCK, USCS), and the other one with the weird stuff.

We can place restrictions on the languages of the string literals returned, see the API docs. I think we need to do this as we can have more than one prefLabel for each concept, and then the UI doesn't know which one to display. In the above case, one has the language string en_GB and the other en_US (this was me being a bit pedantic 😛, but also trying to improve the search data).

My initial idea was that we would display different tables with the conversion factors, with a separate table for each alternative system, such as SimaPro, ecoinvent, etc. I now think that that is a bad idea. We want to support interoperability but also encourage harmonisation to a common standard. So instead I think we should only have a single table, like:

Label Synonyms IRI (click to copy) SimaPro Ecoinvent LCA Commons
Kilogram kg, KGM https://vocab.sentier.dev/qudt/unit/KiloGM kg kg kg

This has implications for the database. I came to this conclusion because I was starting with ecoinvent data, and didn't want to create a separate system for them the way we did for SimaPro.

Please provide feedback so I am not yelling into the void 📢

janfeitkenhauer commented 15 hours ago

Good thing, you added the JSON response. I will work with that until the server is up and running and the endpoint can be accessed from the frontend.

Also, I like the idea to display only one table. It is clean and easy to grasp for the user, without restrictions to what system they are using. If we find, that information is missing, we can easily adapt. We should definitely add language restrictions!

janfeitkenhauer commented 4 hours ago

So, on my way home I thought about the data structure of the JSON response.. The first positions should contain units of the metric system aka International System of Units, always starting with the reference unit. (Kudos to those who are not using the metric system for the 7 dimensions included. You exceed my level of skill and therefore are very able to look further down for your unit. 🤓)

Symbol Name Quantity
s second time
m metre length
kg kilogram mass
A ampere electric current
K kelvin thermodynamic temperature
mol mole amount of substance
cd candela luminous intensity

All units around the base units (like mega, kilo, milli, etc.) should be displayed below the reference unit, unordered. Below them, all the rest, unordered. There are additional units (e.g. velocity) which should work with the same principle. Please ask for clarification, if necessary.

To answer the question of how many unit pages we need, we agreed on a separate endpoint, that provides an array of all base units to be considered or something similar. With the response the frontend should be able to render the unit pages dynamically.

Thats it from me for today. In the upcoming days I will refine the frontend and also commit the code on github. For the initial commit I'd like some support as to where to put the client data, as you guys have already made the commits for the backend.

Cheers!