ec-doris / kohesio-backend

APIs serving Kohesio's frontend
https://kohesio.ec.europa.eu
6 stars 2 forks source link

Keyword search limited to 5000 results #78

Open madewild opened 2 years ago

madewild commented 2 years ago

For instance: https://dev.kohesio.eu/projects?keywords=great

This is linked to the performance issues of the semantic search? Annoying because then the map is not representative of the overall situation...

madewild commented 2 years ago

Increasing leads to performance issues but currently the maps and filters are misleading... Need to think about it.

madewild commented 2 years ago

For instance https://dev.kohesio.eu/projects?keywords=%22road%22&country=Sweden gives only 6 results But we have at least 37: https://query.linkedopendata.eu/#select%20DISTINCT%20%3Fproject%20where%20%7B%0A%20%20%20%20%3Fproject%20%3Chttps%3A%2F%2Flinkedopendata.eu%2Fprop%2Fdirect%2FP35%3E%20%3Chttps%3A%2F%2Flinkedopendata.eu%2Fentity%2FQ9934%3E%20.%0A%20%20%20%20%3Fproject%20%3Chttps%3A%2F%2Flinkedopendata.eu%2Fprop%2Fdirect%2FP32%3E%20%3Chttps%3A%2F%2Flinkedopendata.eu%2Fentity%2FQ11%3E%20.%0A%20%20%20%20OPTIONAL%20%7B%7B%3Fproject%20rdfs%3Alabel%20%3Flabel%20filter%28lang%28%3Flabel%29%20%3D%20%27en%27%29%20%7D%7D%0A%20%20%20%20OPTIONAL%20%7B%7B%3Fproject%20%3Chttps%3A%2F%2Flinkedopendata.eu%2Fprop%2Fdirect%2FP836%3E%20%3Fsummary%20filter%28lang%28%3Fsummary%29%20%3D%20%27en%27%29%20%7D%7D%0A%20%20%20%20FILTER%20%28regex%28%3Flabel%2C%20%22%5C%5Cbroad%5C%5Cb%22%2C%20%22i%22%29%20%7C%7C%20regex%28%3Fsummary%2C%20%22%5C%5Cbroad%5C%5Cb%22%2C%20%22i%22%29%29%0A%7D

madewild commented 2 years ago

The 5000 limit is configured in the SPARQL endpoint for the lucene part @D063520 @DiaZork @svili any idea how we could overcome this limitation without sacrificing performance too much?

madewild commented 2 years ago

@svili this is more long term but when you have time could you look into this? no easy way out but it would be important to improve the current situation at least a bit...

one idea would be to load the geo info of all projects in the background and continue computing the map while the 15 top paginated results are already displayed, but not sure how this would play with the UI

madewild commented 2 years ago

Now we have 112 790 results when searching for "youth" https://dev.kohesio.eu/projects?keywords=youth&sort=Total-Budget-(descending)

Very strange, and even the map tab has many projects! to investigate...

madewild commented 1 year ago

@svili could you investigate this?