Closed nichtich closed 3 years ago
PostgreSQL should have no problem indexing JSON blobs, so it should be possible to do this without changing the data structure. 👍
After experimenting a bit, I've found out that there's a bit difference between using jsonb[]
and jsonb
containing an array. With the former, I can't find a way to perform the query needed for this. So we might need to perform a database transition.
Okay, here's what I've found: It is possible to perform the query whether we use jsonb[]
or jsonb
(containing an array). Here are the two queries:
For jsonb[]
:
select *
from data
where '{ "notation": ["T1--09044"] }' <@ any("memberList");
For jsonb
(containing an array):
select *
from data
where "memberList" @> '[{ "notation": ["T1--09044"] }]';
However, with jsonb[]
, I have not found a way to index the data in a way that the query uses the index. With jsonb
(containing an array), we can use the following index:
CREATE INDEX data_memberList_gin_idx
ON data
USING gin ("memberList");
With this, the query above is able to use the index and perform in <1ms instead of ~20ms (with the current dataset of 600 rows).
Since this project is in a fairly early stage, I don't see an issue with breaking the current database structure since we can manually fix it on our server (which is very likely the only running instance of this).
What do you think, @nichtich?
Ok I'd propose query syntax analyze?memberNotation=T1--09044
and analyze?member=http://dewey.info/class/1--0904/e23/
.
I implemented this in dev
, but with a single query parameter member
that takes either a notation or a URI. I thought memberNotation
was too clunky.
We should watch how this performs with larger datasets, but in theory it should work well with the index.
Given a class such as
T1--09044
return all known build numbers that make use of this class (e.g.700.90440747471
). This might become large result sets and requires a proper database index. Maybe directly use SQL tables and create JSKOS from the pieces instead of storing JSKOS in the database?We could first do some analysis offline with converted files of decomposed DDC numbers (requires reduction via hierarchy as described in #17).