Browser performance issue on iterative property access

matthieu-fesselier commented 4 years ago

I face performances issues when accessing datas on a container. It makes all our apps not useable for now with LDFlex. For example, with the following code:

  await solid.data.context.extend(base_context)
  const container = solid.data["https://apiprod.happy-dev.fr/clients/"];
  console.time(container.toString())
  for await (const r of container.ldp_contains) {
    console.time(r.toString())
    console.log((await r.img).toString());
    console.timeEnd(r.toString())
  }
  console.timeEnd(container.toString())

It takes ~3000ms to display 70 times the img property of a resource. For each resource, it takes between 10ms to 90ms to display its img.

Is there something we can do to improve this?

RubenVerborgh commented 4 years ago

When fixing process.nextTick (as root cause), I obtain identical performance when feeding Node.js the regular source code and the compiled browser version.

Unfortunately, this performance does not translate directly to the browser. It seems that the browser build still contains code that executes differently on Node. So we need to dig deeper.

Yet based on the above, I have already made some performance improvements. Please check out how this branch is working for you: https://github.com/solid/query-ldflex/tree/fix/browser-performance

rubensworks commented 4 years ago

In any case, we're definitely spending way too much time on SPARQL parsing (RubenVerborgh/SPARQL.js#94). It's a major contributor to time.

Instead of passing SPARQL queries to Comunica, LDflex could also just pass SPARQL algebra to Comunica directly. This would avoid the parsing overhead. (This is also how GraphQL-LD does it)

sylvainlb commented 4 years ago

Thanks Ruben for all this study. We're working on a workaround for now on our side. But it's a temporary measure, and as soon as we're done with it, we're gonna have to find a more definitive solution.

RubenVerborgh commented 4 years ago

Definitely, and we know where to look already.

RubenVerborgh commented 4 years ago

@sylvainlb @matthieu-fesselier @happy-dev preload is implemented on the latest LDflex master (https://github.com/RubenVerborgh/LDflex/issues/44). It makes repeated accesses go faster. (However, I still find browser query overhead to be significant: https://github.com/comunica/comunica/issues/561)

matthieu-fesselier commented 4 years ago

I tested it and it works as expected. It does not make the first call faster, but the next ones are

RubenVerborgh commented 4 years ago

Thanks so much for testing this. Will close this now, while we follow up in Comunica to make queries as a whole faster in browsers. We now have several pointers to look at the performance differences.

sylvainlb commented 4 years ago

Thanks Ruben for all these efforts. We're still in the process of closing the crisis on our side, but I'll get back to you once we're done to see how we can proceed to integrate LDFlex in our work.

Thanks!

matthieu-fesselier commented 4 years ago

Hello @RubenVerborgh @rubensworks !

I follow up here as it seems to be the most appropriate issue related to performances. I investigated a bit more the performances with the last version of query-ldflex. (not sure all the dependencies were updated as they should be, maybe you can confirm?)

Here is a real use case of an app made with Startin'blox. On load time:

We load ~ 150 different resources used in ~400 components (most of the time, 1 component is responsible of 1 node only)
We access ~ 1.2k properties on these resources. (~700 different properties)

I made some tests with LDFlex in replacement to our store. It does not loads all the resources mentioned above, it needed more work to make all the components work. However, here are some results:

On load time, we first make 15 calls to solid.data[something] → the page freezes for 5 to 10 seconds (@RubenVerborgh you told us about this on our last call, I confirm it happens also with us).
Then, we make some iterations on containers → it seems pretty fast
Then, we make ~40 calls to solid.data[something] → freezes the page for 2-3 seconds
Then we access some properties on the resources → it seems a bit slow (~8 to 10ms per property). On a list of 600 resources, when we need to show 2 properties of each resource, we have 600 * 10ms * 2 = 12s. I think that these access in the loops make the whole app quite slow.

Here is a small test I made with our data:

  <script src="solid-auth-client.bundle.js"></script>
  <script src="solid-query-ldflex.bundle.js"></script>
    <script>
  document.addEventListener('DOMContentLoaded', async () => {
    const data = solid.data['https://api.community.hubl.world/skills/'];
    for await (const s of data['ldp:contains']) {
      const id = s.value;
      console.time(id)
      await s['rdfs:label'].value;
      await s['type'].value;
      console.timeEnd(id)
    }
  });
  </script>

I tested with distant data (https://api.community.hubl.world/skills/) and the same data in a local jsonld file, with the same results.

Each time shows between 15 and 20ms. As they are all executed sequentially, it takes around 12s to have the whole list loaded.

I hope it helps, don't hesitate to reach back if you need more informations about our tests/uses cases!

RubenVerborgh commented 4 years ago

Thanks @matthieu-fesselier, this is a very interesting case, which we will analyze in detail.

Quick thoughts:

the sequentiality will likely have to be tackled with parallelization
the freezing, I think we will need to tackle by making every 1 out of 100 queueMicrotask calls a setTimeout call instead (good old DoEvents)
In general, the await pattern seems to encourage actively waiting for things, whereas we probably want to dispatch the (unevaluated) LDflex expressions to other functions/components as much as possible, so they all await in their own time.

RubenVerborgh commented 4 years ago

Reminder to self: the above remarks pertain to this hack rather than the queueMicrotask-based AsyncIterator implementation.

matthieu-fesselier commented 4 years ago

A more precise example which illustrates what I said before:

<pre id="test"></pre>
<script>
  // for freeze test
  setInterval(() => {
    document.getElementById('test').textContent = Math.random();
  }, 200);

  document.addEventListener('DOMContentLoaded', async () => {
    const skills = 'https://api.community.hubl.world/skills/';
    const context = {
      '@vocab': 'http://happy-dev.fr/owl/#',
      rdf: 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
      rdfs: 'http://www.w3.org/2000/01/rdf-schema#',
      ldp: 'http://www.w3.org/ns/ldp#',
      foaf: 'http://xmlns.com/foaf/0.1/',
      name: 'rdfs:label',
      acl: 'http://www.w3.org/ns/auth/acl#',
      permissions: 'acl:accessControl',
      mode: 'acl:mode',
      geo: "http://www.w3.org/2003/01/geo/wgs84_pos#",
      lat: "geo:lat",
      lng: "geo:long"
    };

    await solid.data.context.extend(context);
    const data = solid.data[skills];

    console.time('iteration')
    for await (const s of data['http://www.w3.org/ns/ldp#contains']) { }
    console.timeEnd('iteration')

    console.time('iteration + value')
    for await (const s of data['http://www.w3.org/ns/ldp#contains']) {
      const id = s.value;
    }
    console.timeEnd('iteration + value')

    console.time('iteration + 1 prop')
    for await (const s of data['http://www.w3.org/ns/ldp#contains']) {
      await s['rdfs:label'].value;
    }
    console.timeEnd('iteration + 1 prop')

    console.time('iteration + 2 props')
    for await (const s of data['http://www.w3.org/ns/ldp#contains']) {
      await s['rdfs:label'].value;
      await s['type'].value;
    }
    console.timeEnd('iteration + 2 props')
  });
</script>

With this, I can see that:

iteration ~5400ms -> first iteration on the container, no property access
iteration + value ~260ms -> second time, even if s.value is accessed, it's fast
iteration + 1 prop ~5400ms -> when a property is accessed, it's slow again
iteration + 2 prop ~10800ms -> with a new property access, it 2x slower
during all the process (~20s), the setInterval function which should show a random number every 200ms in the page freezes. I'm not sure if it can be compared to what happens in real life, but it seems very similar

RubenVerborgh commented 4 years ago

the freezing, I think we will need to tackle by making every 1 out of 100 queueMicrotask calls a setTimeout call instead (good old DoEvents)

FYI we have this now here: https://github.com/RubenVerborgh/AsyncIterator/commit/c0d8cac36362f305ba2192db974cf18560d271ea#diff-1a12957b96162e114d61ede68b100ab3R13-R21

matthieu-fesselier commented 4 years ago

Following up with the tests I made just above with the new version of AsyncIterator, I am facing a bug, which might be related to #71 . It seems that if I loop twice on the same container, the second time, it never goes inside the loop:

document.addEventListener('DOMContentLoaded', async () => {
        const skills = 'https://api.community.hubl.world/skills/';
        const data = solid.data[skills];
        // 1rst loop
        for await (const s of data['http://www.w3.org/ns/ldp#contains']) { }
        console.log('passes here...');

        // 2nd loop
        for await (const s of data['http://www.w3.org/ns/ldp#contains']) {
          console.log('but not here');
          const id = s.value;
        }
        console.log('and here neither');
});

Did I miss something?

RubenVerborgh commented 4 years ago

I suspect #71 indeed, we're investigating it.

matthieu-fesselier commented 3 years ago

For reference, if it helps, I runned some additionnal tests based on the code I showed here, and here are the results:

instructions	LDFlex (1)	LDFlex + Turtle file (2)	LDFlex + rdflib.js
iteration	7600ms	495ms	3350ms
iteration + value	315ms	320ms	45ms
iteration + 1 prop	3100ms	3150ms	4700ms
iteration + 2 props	6800ms	7050ms	9000ms

(1) I used the master branch (2) I converted the skills JSONLD document in 1 local turtle file

LDflex / Query-Solid

Browser performance issue on iterative property access #45