KhronosGroup / SPIRV-Tools

Apache License 2.0
1.07k stars 551 forks source link

Use O(log n) table lookup algorithms for opcodes #502

Open dneto0 opened 7 years ago

dneto0 commented 7 years ago

Currently the opcode table is just an array, and we do opcode and opcode-string lookups with straight linear search.

See spvOpcodeTableNameLookup and spvOpcodeTableValueLookup in source/opcode.cpp

The underlying tables are generated via the "spvtools_core_tables" invocations in source/CMakeLists.txt. There's a python script that parses the JSON and then generates .inc files in the build output directory. Those are then #included in source/opcode.cpp into static tables.

We can do much better by generating the underlying tables in sorted order. Then use std::lower_bound to find the first matching entry.

But then we need two tables: one sorted by opcode, and the other sorted by opcode string name. I'd like the sorted-by-opcode table to be the primary table, and the sorted-by-string-name table to just have an index into the other one. The reason is that the fast path should be processing a binary module as in a GPU driver, as opposed to doing name lookups for disassembly or assembly as occurs during development.

A wrinkle: We have to allow for the same opcode to be used with two different string names. This is almost certainly going to happen over time as Khronos extensions to SPIR-V get incorporated into the core of a future version of SPIR-V. If that extension describes a new instruction, e.g. OpFooKHR then it would drop the KHR suffix to become OpFoo when it goes into core.

So the work is to:

ratchetfreak commented 7 years ago

Extensions should use the OpExtInst to introduce new opcodes. Once they get into core then they will get promoted to proper opcodes.

dneto0 commented 7 years ago

@ratchetfreak That's a design choice.

There are already KHR extensions that introduce instructions as new (direct) opcodes. Eg. see OpSubgroupBallotKHR in https://www.khronos.org/registry/spir-v/extensions/KHR/SPV_KHR_shader_ballot.html

The reasoning is that the working group feels there is a strong chance that the functionality will be pulled into core at a future date with unchanged semantics. And if that's the case, then tooling that just deals with the numbers of opcodes won't have to change.

dneto0 commented 6 years ago

cc: @antiagainst

jcaraban commented 6 years ago

@dneto0 I can attempt this issue if @antiagainst didn't get started already

antiagainst commented 6 years ago

Yep, please feel free to take it. I haven't touched this one yet.

jcaraban commented 6 years ago

Given that the opcodes are unique, why aren't we using a hash table, to begin with? We could employ a regular array where the opcode is the index and get O(1).

A problem is the gaps between OpDecorateId=332 and OpFragmentFetchAMD=5012, which would bloat the executable. But we could use some hybrid approach. From 0 to ~400, regular array, from ~400 to ~5000, linear or sorted search.

Of course, this only works for opcode lookups. String lookups need a hash table o a trie. However, I'm not sure how these two structures could be generated at compile-time... Generating them at runtime, every time the executable is run, is definitely not an option.

dneto0 commented 6 years ago

We could have two different opcode entries for the same opcode number. This for the case where we publish a KHR extension that is then later pulled into the core spec. If the semantics remain the same then we can reuse the opcode number, but still support two different spellings for the instruction.

I forgot constraints

The segmented array approach would work .... until it doesn't. The addition of opcodes (and other enums) is not under our control. SPIRV-Tools has to respond gracefully to the data presented by the SPIRV-Headers updates.

I think the binary search over sorted arrays is a good simple solution. The .inc files are generated by python scripts during build time. We have complete freedom to rework those scripts.

dneto0 commented 6 years ago

About opcode entries with the same opcode number:

jcaraban commented 6 years ago

The big cost in the opcode string lookup is the strlen( ) in the if (3rd line). Can we precompute the length of the opcode names and store it in the opcode table?

  const size_t nameLength = strlen(name);
  for (uint64_t opcodeIndex = 0; opcodeIndex < table->count; ++opcodeIndex) {
    if (nameLength == strlen(table->entries[opcodeIndex].name) &&
        !strncmp(name, table->entries[opcodeIndex].name, nameLength)) {
      // NOTE: Found out Opcode!
      *pEntry = &table->entries[opcodeIndex];
      return SPV_SUCCESS;
    }
  }

I precomputed the lengths, then tested spirv-as in release mode (-O3), and times when down from 22ms to 16ms for a large ~250kb. Now spvOpcodeTableNameLookup is not captured by the profiler anymore.

Rather than generating and dealing with a second table sorted by opcode name, I think it would pay off to add a nameLength entry to the existing table.

ratchetfreak commented 6 years ago

Actually nameLength == strlen(table->entries[opcodeIndex].name) doesn't need to be done explicitly once you have the bound for one string and know the other is null terminated, !strncmp(name, table->entries[opcodeIndex].name, nameLength) will stop when the first byte is different. That will save an iteration over the opcode name.

jcaraban commented 6 years ago

If you simply remove nameLength == strlen(table->entries[opcodeIndex].name tests will start to fail. Without strlen, the call to strncmp will match overlapping names, e.g. ConstantTrue to Constant

On the other hand, using strcmp (without upper bound) will work fine, but now the compiler issues a call to __strcmp_sse2_unaligned (in my intel machine), and the performance is back to 22ms again, vs 16ms for the precomputed length version.

ratchetfreak commented 6 years ago

You can fix that by adding 1 to the length: !strncmp(name, table->entries[opcodeIndex].name, nameLength+1), That way the terminating null is of name included in the strcmp.

jcaraban commented 6 years ago

@ratchetfreak true that! Unfortunately I tried, the compiler now issues __strncmp_sse42 and times got worse, from 22ms originally (with strlen) to 30ms for my Intel Skylake i7-6700 and gcc 5.2.1. The solution where we precompute the opcode name length continues to be faster.

ratchetfreak commented 6 years ago

if we are precomputing length it makes sense to also precompute a hash, (doesn't need to be a perfect hash, just good enough)

That will reduce the amount of calls to strncmp much better than comparing only the strlen.

  const size_t nameLength = strlen(name);
  const size_t hash= opCodeHash(name); 
  for (uint64_t opcodeIndex = 0; opcodeIndex < table->count; ++opcodeIndex) {
    if (hash == table->entries[opcodeIndex].hash &&
        !strncmp(name, table->entries[opcodeIndex].name, nameLength+1)) {
      // NOTE: Found out Opcode!
      *pEntry = &table->entries[opcodeIndex];
      return SPV_SUCCESS;
    }
  }
dj2 commented 6 years ago

The original PR was merged which handled the id lookup. The missing piece was doing the name table, is that change still worth it or can this be closed?

dneto0 commented 6 years ago

The original PR was merged which handled the id lookup. The missing piece was doing the name table, is that change still worth it or can this be closed?

Good question. The critical path is the one used for SPIR-V binaries, which is the value lookup. The string lookups are for more developer env settings. So that's lower priority.

I'd like to keep this open but at reduced priority. The second sorted table for opcode name should less space where possible, e..g it's a table of {string, index} where the index points into the entry in the full table that's sorted by value.