LDflex / Query-Solid

Simple access to data in Solid pods through LDflex expressions
https://solid.github.io/query-ldflex/
MIT License
66 stars 15 forks source link

Browser performance issue on iterative property access #45

Open matthieu-fesselier opened 4 years ago

matthieu-fesselier commented 4 years ago

I face performances issues when accessing datas on a container. It makes all our apps not useable for now with LDFlex. For example, with the following code:

  await solid.data.context.extend(base_context)
  const container = solid.data["https://apiprod.happy-dev.fr/clients/"];
  console.time(container.toString())
  for await (const r of container.ldp_contains) {
    console.time(r.toString())
    console.log((await r.img).toString());
    console.timeEnd(r.toString())
  }
  console.timeEnd(container.toString())

It takes ~3000ms to display 70 times the img property of a resource. For each resource, it takes between 10ms to 90ms to display its img.

Is there something we can do to improve this?

RubenVerborgh commented 4 years ago

When fixing process.nextTick (as root cause), I obtain identical performance when feeding Node.js the regular source code and the compiled browser version.

Unfortunately, this performance does not translate directly to the browser. It seems that the browser build still contains code that executes differently on Node. So we need to dig deeper.

Yet based on the above, I have already made some performance improvements. Please check out how this branch is working for you: https://github.com/solid/query-ldflex/tree/fix/browser-performance

rubensworks commented 4 years ago

In any case, we're definitely spending way too much time on SPARQL parsing (RubenVerborgh/SPARQL.js#94). It's a major contributor to time.

Instead of passing SPARQL queries to Comunica, LDflex could also just pass SPARQL algebra to Comunica directly. This would avoid the parsing overhead. (This is also how GraphQL-LD does it)

sylvainlb commented 4 years ago

Thanks Ruben for all this study. We're working on a workaround for now on our side. But it's a temporary measure, and as soon as we're done with it, we're gonna have to find a more definitive solution.

RubenVerborgh commented 4 years ago

Definitely, and we know where to look already.

RubenVerborgh commented 4 years ago

@sylvainlb @matthieu-fesselier @happy-dev preload is implemented on the latest LDflex master (https://github.com/RubenVerborgh/LDflex/issues/44). It makes repeated accesses go faster. (However, I still find browser query overhead to be significant: https://github.com/comunica/comunica/issues/561)

matthieu-fesselier commented 4 years ago

I tested it and it works as expected. It does not make the first call faster, but the next ones are

RubenVerborgh commented 4 years ago

Thanks so much for testing this. Will close this now, while we follow up in Comunica to make queries as a whole faster in browsers. We now have several pointers to look at the performance differences.

sylvainlb commented 4 years ago

Thanks Ruben for all these efforts. We're still in the process of closing the crisis on our side, but I'll get back to you once we're done to see how we can proceed to integrate LDFlex in our work.

Thanks!

matthieu-fesselier commented 4 years ago

Hello @RubenVerborgh @rubensworks !

I follow up here as it seems to be the most appropriate issue related to performances. I investigated a bit more the performances with the last version of query-ldflex. (not sure all the dependencies were updated as they should be, maybe you can confirm?)

Here is a real use case of an app made with Startin'blox. On load time:

I made some tests with LDFlex in replacement to our store. It does not loads all the resources mentioned above, it needed more work to make all the components work. However, here are some results:

Here is a small test I made with our data:

  <script src="solid-auth-client.bundle.js"></script>
  <script src="solid-query-ldflex.bundle.js"></script>
    <script>
  document.addEventListener('DOMContentLoaded', async () => {
    const data = solid.data['https://api.community.hubl.world/skills/'];
    for await (const s of data['ldp:contains']) {
      const id = s.value;
      console.time(id)
      await s['rdfs:label'].value;
      await s['type'].value;
      console.timeEnd(id)
    }
  });
  </script>

I tested with distant data (https://api.community.hubl.world/skills/) and the same data in a local jsonld file, with the same results.

Each time shows between 15 and 20ms. As they are all executed sequentially, it takes around 12s to have the whole list loaded.

I hope it helps, don't hesitate to reach back if you need more informations about our tests/uses cases!

RubenVerborgh commented 4 years ago

Thanks @matthieu-fesselier, this is a very interesting case, which we will analyze in detail.

Quick thoughts:

RubenVerborgh commented 4 years ago

Reminder to self: the above remarks pertain to this hack rather than the queueMicrotask-based AsyncIterator implementation.

matthieu-fesselier commented 4 years ago

A more precise example which illustrates what I said before:

<pre id="test"></pre>
<script>
  // for freeze test
  setInterval(() => {
    document.getElementById('test').textContent = Math.random();
  }, 200);

  document.addEventListener('DOMContentLoaded', async () => {
    const skills = 'https://api.community.hubl.world/skills/';
    const context = {
      '@vocab': 'http://happy-dev.fr/owl/#',
      rdf: 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
      rdfs: 'http://www.w3.org/2000/01/rdf-schema#',
      ldp: 'http://www.w3.org/ns/ldp#',
      foaf: 'http://xmlns.com/foaf/0.1/',
      name: 'rdfs:label',
      acl: 'http://www.w3.org/ns/auth/acl#',
      permissions: 'acl:accessControl',
      mode: 'acl:mode',
      geo: "http://www.w3.org/2003/01/geo/wgs84_pos#",
      lat: "geo:lat",
      lng: "geo:long"
    };

    await solid.data.context.extend(context);
    const data = solid.data[skills];

    console.time('iteration')
    for await (const s of data['http://www.w3.org/ns/ldp#contains']) { }
    console.timeEnd('iteration')

    console.time('iteration + value')
    for await (const s of data['http://www.w3.org/ns/ldp#contains']) {
      const id = s.value;
    }
    console.timeEnd('iteration + value')

    console.time('iteration + 1 prop')
    for await (const s of data['http://www.w3.org/ns/ldp#contains']) {
      await s['rdfs:label'].value;
    }
    console.timeEnd('iteration + 1 prop')

    console.time('iteration + 2 props')
    for await (const s of data['http://www.w3.org/ns/ldp#contains']) {
      await s['rdfs:label'].value;
      await s['type'].value;
    }
    console.timeEnd('iteration + 2 props')
  });
</script>

With this, I can see that:

RubenVerborgh commented 4 years ago

the freezing, I think we will need to tackle by making every 1 out of 100 queueMicrotask calls a setTimeout call instead (good old DoEvents)

FYI we have this now here: https://github.com/RubenVerborgh/AsyncIterator/commit/c0d8cac36362f305ba2192db974cf18560d271ea#diff-1a12957b96162e114d61ede68b100ab3R13-R21

matthieu-fesselier commented 4 years ago

Following up with the tests I made just above with the new version of AsyncIterator, I am facing a bug, which might be related to #71 . It seems that if I loop twice on the same container, the second time, it never goes inside the loop:

document.addEventListener('DOMContentLoaded', async () => {
        const skills = 'https://api.community.hubl.world/skills/';
        const data = solid.data[skills];
        // 1rst loop
        for await (const s of data['http://www.w3.org/ns/ldp#contains']) { }
        console.log('passes here...');

        // 2nd loop
        for await (const s of data['http://www.w3.org/ns/ldp#contains']) {
          console.log('but not here');
          const id = s.value;
        }
        console.log('and here neither');
});

Did I miss something?

RubenVerborgh commented 4 years ago

I suspect #71 indeed, we're investigating it.

matthieu-fesselier commented 3 years ago

For reference, if it helps, I runned some additionnal tests based on the code I showed here, and here are the results:

instructions LDFlex (1) LDFlex + Turtle file (2) LDFlex + rdflib.js
iteration 7600ms 495ms 3350ms
iteration + value 315ms 320ms 45ms
iteration + 1 prop 3100ms 3150ms 4700ms
iteration + 2 props 6800ms 7050ms 9000ms

(1) I used the master branch (2) I converted the skills JSONLD document in 1 local turtle file