Open matthieu-fesselier opened 4 years ago
When fixing process.nextTick
(as root cause), I obtain identical performance when feeding Node.js the regular source code and the compiled browser version.
Unfortunately, this performance does not translate directly to the browser. It seems that the browser build still contains code that executes differently on Node. So we need to dig deeper.
Yet based on the above, I have already made some performance improvements. Please check out how this branch is working for you: https://github.com/solid/query-ldflex/tree/fix/browser-performance
In any case, we're definitely spending way too much time on SPARQL parsing (RubenVerborgh/SPARQL.js#94). It's a major contributor to time.
Instead of passing SPARQL queries to Comunica, LDflex could also just pass SPARQL algebra to Comunica directly. This would avoid the parsing overhead. (This is also how GraphQL-LD does it)
Thanks Ruben for all this study. We're working on a workaround for now on our side. But it's a temporary measure, and as soon as we're done with it, we're gonna have to find a more definitive solution.
Definitely, and we know where to look already.
@sylvainlb @matthieu-fesselier @happy-dev preload
is implemented on the latest LDflex master
(https://github.com/RubenVerborgh/LDflex/issues/44). It makes repeated accesses go faster. (However, I still find browser query overhead to be significant: https://github.com/comunica/comunica/issues/561)
I tested it and it works as expected. It does not make the first call faster, but the next ones are
Thanks so much for testing this. Will close this now, while we follow up in Comunica to make queries as a whole faster in browsers. We now have several pointers to look at the performance differences.
Thanks Ruben for all these efforts. We're still in the process of closing the crisis on our side, but I'll get back to you once we're done to see how we can proceed to integrate LDFlex in our work.
Thanks!
Hello @RubenVerborgh @rubensworks !
I follow up here as it seems to be the most appropriate issue related to performances. I investigated a bit more the performances with the last version of query-ldflex. (not sure all the dependencies were updated as they should be, maybe you can confirm?)
Here is a real use case of an app made with Startin'blox. On load time:
I made some tests with LDFlex in replacement to our store. It does not loads all the resources mentioned above, it needed more work to make all the components work. However, here are some results:
solid.data[something]
→ the page freezes for 5 to 10 seconds (@RubenVerborgh you told us about this on our last call, I confirm it happens also with us).solid.data[something]
→ freezes the page for 2-3 seconds600 * 10ms * 2 = 12s
.
I think that these access in the loops make the whole app quite slow.Here is a small test I made with our data:
<script src="solid-auth-client.bundle.js"></script>
<script src="solid-query-ldflex.bundle.js"></script>
<script>
document.addEventListener('DOMContentLoaded', async () => {
const data = solid.data['https://api.community.hubl.world/skills/'];
for await (const s of data['ldp:contains']) {
const id = s.value;
console.time(id)
await s['rdfs:label'].value;
await s['type'].value;
console.timeEnd(id)
}
});
</script>
I tested with distant data (https://api.community.hubl.world/skills/) and the same data in a local jsonld file, with the same results.
Each time shows between 15 and 20ms. As they are all executed sequentially, it takes around 12s to have the whole list loaded.
I hope it helps, don't hesitate to reach back if you need more informations about our tests/uses cases!
Thanks @matthieu-fesselier, this is a very interesting case, which we will analyze in detail.
Quick thoughts:
queueMicrotask
calls a setTimeout
call instead (good old DoEvents
)await
pattern seems to encourage actively waiting for things, whereas we probably want to dispatch the (unevaluated) LDflex expressions to other functions/components as much as possible, so they all await
in their own time.Reminder to self: the above remarks pertain to this hack rather than the queueMicrotask
-based AsyncIterator
implementation.
A more precise example which illustrates what I said before:
<pre id="test"></pre>
<script>
// for freeze test
setInterval(() => {
document.getElementById('test').textContent = Math.random();
}, 200);
document.addEventListener('DOMContentLoaded', async () => {
const skills = 'https://api.community.hubl.world/skills/';
const context = {
'@vocab': 'http://happy-dev.fr/owl/#',
rdf: 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
rdfs: 'http://www.w3.org/2000/01/rdf-schema#',
ldp: 'http://www.w3.org/ns/ldp#',
foaf: 'http://xmlns.com/foaf/0.1/',
name: 'rdfs:label',
acl: 'http://www.w3.org/ns/auth/acl#',
permissions: 'acl:accessControl',
mode: 'acl:mode',
geo: "http://www.w3.org/2003/01/geo/wgs84_pos#",
lat: "geo:lat",
lng: "geo:long"
};
await solid.data.context.extend(context);
const data = solid.data[skills];
console.time('iteration')
for await (const s of data['http://www.w3.org/ns/ldp#contains']) { }
console.timeEnd('iteration')
console.time('iteration + value')
for await (const s of data['http://www.w3.org/ns/ldp#contains']) {
const id = s.value;
}
console.timeEnd('iteration + value')
console.time('iteration + 1 prop')
for await (const s of data['http://www.w3.org/ns/ldp#contains']) {
await s['rdfs:label'].value;
}
console.timeEnd('iteration + 1 prop')
console.time('iteration + 2 props')
for await (const s of data['http://www.w3.org/ns/ldp#contains']) {
await s['rdfs:label'].value;
await s['type'].value;
}
console.timeEnd('iteration + 2 props')
});
</script>
With this, I can see that:
s.value
is accessed, it's fastthe freezing, I think we will need to tackle by making every 1 out of 100
queueMicrotask
calls asetTimeout
call instead (good oldDoEvents
)
FYI we have this now here: https://github.com/RubenVerborgh/AsyncIterator/commit/c0d8cac36362f305ba2192db974cf18560d271ea#diff-1a12957b96162e114d61ede68b100ab3R13-R21
Following up with the tests I made just above with the new version of AsyncIterator, I am facing a bug, which might be related to #71 . It seems that if I loop twice on the same container, the second time, it never goes inside the loop:
document.addEventListener('DOMContentLoaded', async () => {
const skills = 'https://api.community.hubl.world/skills/';
const data = solid.data[skills];
// 1rst loop
for await (const s of data['http://www.w3.org/ns/ldp#contains']) { }
console.log('passes here...');
// 2nd loop
for await (const s of data['http://www.w3.org/ns/ldp#contains']) {
console.log('but not here');
const id = s.value;
}
console.log('and here neither');
});
Did I miss something?
I suspect #71 indeed, we're investigating it.
For reference, if it helps, I runned some additionnal tests based on the code I showed here, and here are the results:
instructions | LDFlex (1) | LDFlex + Turtle file (2) | LDFlex + rdflib.js |
---|---|---|---|
iteration | 7600ms | 495ms | 3350ms |
iteration + value | 315ms | 320ms | 45ms |
iteration + 1 prop | 3100ms | 3150ms | 4700ms |
iteration + 2 props | 6800ms | 7050ms | 9000ms |
(1) I used the master
branch
(2) I converted the skills JSONLD document in 1 local turtle file
I face performances issues when accessing datas on a container. It makes all our apps not useable for now with LDFlex. For example, with the following code:
It takes ~3000ms to display 70 times the
img
property of a resource. For each resource, it takes between 10ms to 90ms to display itsimg
.Is there something we can do to improve this?