Documentation Questions

ThomasThelen commented 2 years ago

I've been looking at the story documentation which does a great job of explaining what the JS components are (I'm going to have to remember this doc framework for other projects)-but I'm no seeing anything on how to configure Sampo to fit specific use cases. For example, I think that to customize Sampo, I should be modifying the json files in src/client/configs/, but this isn't mentioned anywhere in the docs (that I've seen). I'm also a little confused about how much of the javascript in the src/client/components I should be modifying. Since this is the central part of documentation for users, I'm assuming that most of the customization work will be in this area-but the code looks fairly general which makes me think twice. From the main Sampo paper it looks like I can create facets from the more general facet components?

In short, I may be looking at the wrong documentation. Are there any docs that go over the architecture of how the config files relate to the user interface along with details on where the SPARQL queries should go? I'll also note that due to the size of the graph (>5 billion triples) I'm specifically looking at the ClientFS rather than ServerFS.

edit: After some playing around, it looks like I do create facets based on the components, and then link them to text in the translations files. The SPARQL query gets linked through the ID and the propertiesQueryBlock in the perspective JSON file.

esikkala commented 2 years ago

Thanks for your interest! I heavily refactored the Sampo-UI codebase last year, and documenting all that work is still on the todo list. Also the Storybook documentation page is somewhat outdated. I'll write some points here, maybe these can be expanded into proper documentation text later. The main idea now is that new portals can be created by forking this repository and then editing only the following files:

main JSON config files in src/configs/<PORTAL_ID>/
translations in src/client/translations/<PORTAL_ID>/
SPARQL queries in src/server/sparql/<PORTAL_ID>/

So to use the current functionalities, there would be no need to touch any actual JavaScript files. Still, there are some portal specifc JS files (e.g. main page, footer) usually in src/client/componets/perspectives/<PORTAL_ID>/

The example portal ID is sampo, so one way of creating a new portal is that you re-create those 4 folders with a new portal ID. Then most development work would be carried out by modifying the portal specific JSON configs and SPARQL queries.

A major caveat is that we have been using only Apache Jena Fuseki as a triplestore. This means that especially code and queries related full text search works only with the Jena Full Text Search index. Another caveat is that the whole Sampo-UI codebase is an experimental product of various research projects, so using it for implementing any production level portals is definetely not recommended, although this is already happening. :sweat_smile:

esikkala commented 2 years ago

About the faceted search implementations (ClientFS and ServerFS): ClientFS code was developed in a research project in 2018-2019, and it has not been developed further since that. All development work has been related to ServerFS in recent years.

In ServerFS, which in our view is the "purist" way of doing faceted search over RDF graphs, the main thing to consider is that how many instances (meaning possible search results) there are in the RDFS class (or similar) that you are targeting. There may be millions or billions triples in the graph, but you have to split those into meaningful classes, which then act as a base for perspectives in a Sampo-UI based portal. In our experiments, the maximum number of instances in one RDFS class has been around 1 million instances. If you go well over that, this sort of faceted search based on SPARQL queries becomes too much to handle for the triplestore. We are running our Fusekis on virtual machines with 128GB RAM (multiple Fusekis on the same virtual machine).

For example in the MMM portal you can try out how ServerFS works with a class of ~220 000 instances.

ThomasThelen commented 2 years ago

Wow thank you so much for all of this information! I totally understand the WIP documentation and appreciate the overview of how to customize an instance. Given that we're doing text search with GraphDB's Elasticsearch connector, which is interacted with through SPARQL queries, we might be able to work around the text search-but I'll keep an eye out.

There's a good chance that I see if I can use this as a frontend for gnis-ld.org too (it's also backed with GraphDB). Thanks a million for this great software!

SemanticComputing / sampo-ui

Documentation Questions #33