YosysHQ / nextpnr

nextpnr portable FPGA place and route tool
ISC License
1.27k stars 240 forks source link

Smarter initial placement for ice40 #116

Open smunaut opened 5 years ago

smunaut commented 5 years ago

I have a design that fails initial placement. This is mostly because of incompatible CE lines for the flipflops and it is for a LP384 which doesn't have a lot of resources and I'm trying to pack 330 LCs in there.

icecube2 does produce a valid and working bitstream, so that design can definitely fit. But looking at the output, it seems that the synthesizer of icecube 2 also produces a slightly smaller result, so this might help as well.

Anyway, what I've been toying with is to add a int Arch::scoreBelForCell(CellInfo *cell, BelId bel) const method that returns a "score" of how good a BEL is for a CELL. and then I use that score in the initial placement instead of random.

Currently my attempt looks like this :

int Arch::scoreBelForCell(CellInfo *cell, BelId bel) const
{       
    /* Only process LC */
    if (cell->type != id_ICESTORM_LC)
        return 0;

    NPNR_ASSERT(getBelType(bel) == id_ICESTORM_LC);

    /* If the cell doesn't have any FF, no preferences */
    if (!cell->lcInfo.dffEnable)
        return 8;

    /* If the cell has a FF, then count how many bels would be used in that slice */
    size_t num_cells = 0; 

    Loc bel_loc = getBelLocation(bel);
    for (auto bel_other : getBelsByTile(bel_loc.x, bel_loc.y)) {
        CellInfo *ci_other = getBoundBelCell(bel_other);
        if (ci_other != nullptr && bel_other != bel)
            num_cells++;
    }

    return 8 - num_cells;
}

Basically if the cell has flipflops, it will try to pack it in the same slice as other cells that have the same constraints. If the cell doesn't have a FF, then doesn't matter, it can get ripped up and replaced somewhere else if needed.

This does help. I went from finding placement for only ~ 230 LCs to about 320 LCs. Still not enough for my design though.

eddiehung commented 5 years ago

@smunaut Can you share a testcase please?

smunaut commented 5 years ago

This is the code I'm working with : https://github.com/smunaut/nextpnr/tree/ice40_initial_placement

And here's the json for a design that's exhibiting issues :

phase_addon.json.gz phase_addon.pcf.gz

nextpnr-ice40  --lp384 --package qn32 --json phase_addon.json --pcf phase_addon.pcf --asc phase_addon.asc
eddiehung commented 5 years ago

Thanks! I just had a quick look, and it looks like initial placement requires legality of cen/sr/clk/clk-polarity/num-inputs. In a super congested design like yours, the current algorithm fails to find a valid location because it cannot find a compatible (or empty) logic block to be placed into. You can see this in the GUI: image Let us rethink what (and take a peek at what arachne does) to do in this case...

smunaut commented 5 years ago

Yes, I know that's why it has troubles. And that's also why I was trying to "guide" it to be a bit smarter instead of just placing things randomly.

The current rip-up code during initial_place will only rip up cells that are used but valid. But in this case, to make space, it would need to rip-up other cells to free up entire logic blocks.

So what I tried to do is to make it do is to use as few logic blocks as possible during initial placement by preferring to place compatible DFFs together in the same logic block rather than use up a new one.

Something else that helps is to make the initial placement in two pass : first place all the cells that use the DFF (and pack them as described above). Then place all the cells that only use the LUTs. But I couldn't really find a good way to do that without putting ice40 specific code in the generic part of the code.

(note that arachne fails as well ... I think that just by luck it works on this particular example, but that's just really luck)

eddiehung commented 5 years ago

Indeedly! My understanding is that inside the call to place_initial() here:

https://github.com/YosysHQ/nextpnr/blob/9472b6d78f68544d430feeae6d75dbd2dc43019d/common/placer1.cc#L326

it first checks legality for isValidBelForCell(), before considering if it is occupied and then scoring it for rip-up. In your testcase, it cannot find any legal bels, let alone rip anything up.

My guess is that changing the scoreBelForCell function guides the placer to choose a different placement pattern that leads to it getting further, but eventually fails too.

This could be quite a spicy problem. There exists many different cen/sr/clk/negedge "control sets", and not all members of this set can be placed onto the same tile.

smunaut commented 5 years ago

Well, it can find some valid ones actually (those with the same control set), but ripping it up doesn't help because when comes time to re-place the ripped-up one, well ... there is no space for it either ...

Which is exactly what the code in my tree improves. Packing as many of the cells with the same control sets in as little tiles as possible, to leave as many tiles free as possible for other control sets. Just an heuristic of course.

eddiehung commented 5 years ago

Sorry! You're absolutely right, it runs out of rip up iterations [1]. I misread iters as being the number of iterations it has taken, and not a countdown to giving up.

One possibility is that it has exhausted all tiles. For example, let's say control set "A" occupies one tile, and control set "B" occupies all other tiles. The fact you have 9 cells to place onto 8 bels inside that one "A" tile means however many times you rip-up, you'll never find a solution.

The issue then is, how do we not make "B" bel-spread itself across all the other tiles (when we know it doesn't need to) to begin with?

[1]:

Catchpoint 1 (exception thrown), 0x00007ffff5231c3d in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) bt
#0  0x00007ffff5231c3d in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x000055555574dcc8 in nextpnr_ice40::logv_error (format=<optimized out>, ap=ap@entry=0x7fffffffd370)
    at /home/eddie/nextpnr/master/common/log.cc:167
#2  0x000055555574dd79 in nextpnr_ice40::log_error (
    format=format@entry=0x55555596be10 "failed to place cell '%s' of type '%s'\n")
    at /home/eddie/nextpnr/master/common/log.cc:218
#3  0x000055555575e0ea in nextpnr_ice40::SAPlacer::place_initial (cell=0x5555612d6850, this=0x7fffffffd6d0)
    at /home/eddie/nextpnr/master/common/placer1.cc:355
#4  nextpnr_ice40::SAPlacer::place (this=this@entry=0x7fffffffd6d0) at /home/eddie/nextpnr/master/common/placer1.cc:152
#5  0x0000555555758867 in nextpnr_ice40::placer1 (ctx=ctx@entry=0x555560d372a0, cfg=...)
    at /home/eddie/nextpnr/master/common/placer1.cc:522
#6  0x00005555557ac795 in nextpnr_ice40::Arch::place (this=0x555560d372a0)
    at /home/eddie/nextpnr/master/ice40/arch.cc:610
#7  0x000055555573f03d in nextpnr_ice40::CommandHandler::executeMain (this=this@entry=0x7fffffffdd10, 
    ctx=std::unique_ptr<nextpnr_ice40::Context> = {...}) at /home/eddie/nextpnr/master/common/command.cc:230
#8  0x000055555573f4ce in nextpnr_ice40::CommandHandler::exec (this=0x7fffffffdd10)
    at /home/eddie/nextpnr/master/common/command.cc:281
#9  0x000055555573a814 in main (argc=<optimized out>, argv=<optimized out>)
    at /home/eddie/nextpnr/master/ice40/main.cc:161
(gdb) f 3
#3  0x000055555575e0ea in nextpnr_ice40::SAPlacer::place_initial (cell=0x5555612d6850, this=0x7fffffffd6d0)
    at /home/eddie/nextpnr/master/common/placer1.cc:355
355                     log_error("failed to place cell '%s' of type '%s'\n", cell->name.c_str(ctx), cell->type.c_str(ctx));
(gdb) print iters
$1 = 0
smunaut commented 5 years ago

A couple of things :

eddiehung commented 5 years ago

Sorry @smunaut , I should have studied your solution a little more before making redundant comments!

On revisiting this today (on 3435691), I see things have changed so that this gets further, at least for me:

Info: Placed 15 cells based on constraints.
Info: Creating initial placement for remaining 330 cells.
Info:   initial placement placed 330/330 cells
Info: Running simulated annealing placer.
Info:   at iteration #1: temp = 10000.000000, cost = 2345, est tns = -29.54ns
Info:   at iteration #5: temp = 2401.000000, cost = 2336, est tns = -24.39ns
Info:   at iteration #10: temp = 403.536072, cost = 2323, est tns = -25.65ns
Info:   at iteration #15: temp = 67.822304, cost = 2284, est tns = -26.90ns
Info:   at iteration #20: temp = 23.263050, cost = 2129, est tns = -24.50ns
Info:   at iteration #25: temp = 11.398895, cost = 2116, est tns = -26.49ns
Info:   at iteration #30: temp = 10.259006, cost = 2036, est tns = -23.33ns
Info:   at iteration #35: temp = 7.478815, cost = 1929, est tns = -25.48ns
Info:   at iteration #40: temp = 6.394387, cost = 1770, est tns = -22.48ns
Info:   at iteration #45: temp = 5.770934, cost = 1732, est tns = -20.95ns
Info:   at iteration #50: temp = 4.947855, cost = 1636, est tns = -22.05ns
Info:   at iteration #55: temp = 4.700462, cost = 1481, est tns = -20.54ns
Info:   at iteration #60: temp = 4.465439, cost = 1411, est tns = -18.44ns
Info:   at iteration #65: temp = 4.465439, cost = 1382, est tns = -18.37ns
Info:   at iteration #70: temp = 4.242167, cost = 1305, est tns = -18.13ns
Info:   at iteration #75: temp = 3.828556, cost = 1392, est tns = -19.89ns
Info:   at iteration #80: temp = 3.282508, cost = 1268, est tns = -17.55ns
Info:   at iteration #85: temp = 2.962464, cost = 1230, est tns = -15.98ns
Info:   at iteration #90: temp = 2.814340, cost = 1162, est tns = -13.88ns
Info:   at iteration #95: temp = 2.539942, cost = 1167, est tns = -14.80ns
Info:   at iteration #100: temp = 2.068799, cost = 1117, est tns = -14.88ns
Info:   at iteration #105: temp = 1.655039, cost = 1068, est tns = -13.80ns
Info:   at iteration #110: temp = 1.257830, cost = 1043, est tns = -12.80ns
Info: Legalising relative constraints...
Info:     moved 40 cells, 32 unplaced (after legalising chains)
Info:        average distance 1.450000
Info:        maximum distance 3.000000
ERROR: failed to place cell '$abc$4116$auto$blifparse.cc:492:parse_blif$4211_LC' of type 'ICESTORM_LC' (ripup iteration limit exceeded)
ERROR: Placing design failed.

but appears to be failing after ripping up cells, legalising carry chains, and finding it can't re-place those that it ripped up...

Maybe this deserves a new issue?

smunaut commented 5 years ago

In the current version of the code, it's highly dependent on the random generator basically ... the scoring I suggested just makes it much more reliable in finding an initial valid placement.

However as you see in your attempt, just because it finds an initial placement doesn't mean it will actually manage to place and route it.

smunaut commented 4 months ago

Still an issue FWIW ... I've been revisiting that old project and I can't get it to build at all anymore with any recent toolchain ... (HeAP or SA, both fails).