Improve Namespace/Uniquification Performance for Vectors

seldridge commented 4 years ago

Currently, the performance of uniquification and namespace insertion/queries for vectors is pretty bad. The vectors are flattened into the namespace.

Consider:

foo: UInt<1>[4]

For a namespace, this is a scala.collection.mutable.HashSet[String]. For uniquification, this is a flattened aggregate type.

This is all sound, but can result in large construction time when the vectors are large and or there's deep nesting of vectors, e.g., for something like:

foo: UInt<1>[4][4][4][4]

This gets flattened in the namespace into:

foo_0_0_0_0
foo_0_0_0_1
foo_0_0_0_2
...

If you then have something like foo_5 and want to know if it's in the namespace, you shouldn't be doing a comparison to a set, but querying if the interval 5 exists in the known intervals under prefix foo_, here ([0, 3]). If the numeric portion of the namespace were setup to use an interval tree, this would be O(nlgn) construction, O(n) memory consumption, and O(lgn) query time (because the interval tree has non-overlapping intervals). Note that this is for n being the number of intervals. Currently, we have construction/insertion time of O(m) where m is the summation of interval sizes. This effectively caps the size of vectors that the FIRRTL compiler can handle. While technically query time goes up for an interval tree (from O(1) to O(lgn)) the number of intervals isn't huge and intervals could be dynamically merged to keep n small. E.g., _T_1 and _T_2 should be stored as ([1, 2]) and not ([1], [2]). Sparseness of temporaries might be a consideration of optimized FIRRTL IR and may become an issue.

At a high level, it seems like this could all be cleaned up by using a namespace with the following properties:

The namespace is a hierarchy with different levels split by delimiter (one or more _)
The namespace uses a different storage mechanism for strings and numbers (strings can be a set or a trie, numbers can be an interval tree)

Note: this does not solve the issue of the number of signals after lower types. E.g., the 4-dimensional vector foo above will eventually get lowered to 4^4 wires. However, a more efficient namespace would enable earlier transforms to quickly reason about allowable names (which is not possible now). Also, this would let FIRRTL handle large vectors in higher forms and convert them to memories in lower forms.

Checklist

[x] Did you write out a description of the feature you want to see?
[x] Did you look around for any related features?
[x] Did you specify relevant external information?

Feature Description

See above.

Type of Feature

performance improvement

Related Features

The existing namespace implementation is providing the current slow behavior.

External Information

This issue was seeded by thoughts here: https://github.com/freechipsproject/firrtl/pull/1549#issuecomment-672025842.

ekiwi-sifive commented 4 years ago

However, a more efficient namespace would enable earlier transforms to quickly reason about allowable names (which is not possible now).

I don't know if I understand what you are trying to say here. The only point where you should have to worry about uniquifying names is LowerTypes. In any "earlier transform" it should be enough to not collide with the toplevel namespace, no need to worry about bundle or vector types of things since the fields/vector entries are still their own namespace. Only once we lower types do be "import" the field and vector entries into the top level namespace.

seldridge commented 4 years ago

Yeah, fair point. I guess the only place this matters is uniquify (and your work on unifying uniquification and lower types removes the need for repeated uniquification---it's also questionable as to why so many transforms depended on uniquify other than to preserve the old ordering). I was confusing myself with the namespace that uniquify constructs vs. the namespace that you get for a high/mid form circuit. These aren't the same thing as you point out.

Constructing the namespace after lower types is still going to be slow, regardless of the datastructure, as it's the same number of insertions. Memory consumption would go down with native-range support, but that may not be enough motivation.

ekiwi-sifive commented 4 years ago

Constructing the namespace after lower types is still going to be slow, regardless of the datastructure, as it's the same number of insertions. Memory consumption would go down with native-range support, but that may not be enough motivation.

Ideally we would be able to emit vectors natively to Verilog. Currently LowerTypes essentially converts vectors into bundles right before flattening them. Instead it could convert all "arrays of structs" into "structs of arrays" an leave the vectors in place while flattening the bundles only. I would also add a conversion from N-dimensional vectors to 1-dimensional vectors. Then it should be (hopefully) relatively straight forward to generate Verilog code that uses 1-D arrays of bitvectors. If there are any arrays on the ports of a module we would have to convert them into a bitvector around the module boundaries though.

seldridge commented 4 years ago

Emission of vectors in the Verilog (and structs in SystemVerilog) is definitely needed, though this is kind of supported if you have a memory that isn't replaced by ReplSeqMem. One approach compatible with the existing low form spec would be to go from 1-D vectors to memories.

Arrays of structs to structs of arrays would also help enable collapsing one-bit vectors.

ekiwi-sifive commented 4 years ago

One approach compatible with the existing low form spec would be to go from 1-D vectors to memories.

I am skeptical of that because it would essentially amount to memory inference, the brittleness of which (I assume) informed the decision to have a built-int memory construct.

chipsalliance / firrtl