JulianKemmerer / PipelineC

A C-like hardware description language (HDL) adding high level synthesis(HLS)-like automatic pipelining as a language construct/compiler feature.
https://github.com/JulianKemmerer/PipelineC/wiki
GNU General Public License v3.0
606 stars 50 forks source link

Clock enables / feedback inputs do not work as expected for autopipelined logic #122

Open JulianKemmerer opened 2 years ago

JulianKemmerer commented 2 years ago

Clock enables and feedback ready signals look like any other data input and are delayed/registered to maintain alignment with other inputs in the data flow graph of pipelines - ex. CE is not not wired directly register CE ports as expected in a pipeline

Things work fine for not-pipelines lines like single cycle fsms and such where single cycle comb logic clock enables and ready do work as expected...

JulianKemmerer commented 2 years ago

Work around is to have a FIFO after the autopipelined area (or otherwise be always ready for output)...

#include "uintN_t.h"
#include "axi/axis.h"
#include "fifo.h"
#include "compiler.h"

#pragma MAIN_MHZ the_pipeline 300.0
#pragma MAIN_MHZ io_and_ctrl 300.0
#pragma PART "xcu50-fsvh2104-2-e"

// Wires into and out of autopipelined region
uint32_t pipeline_in_data;
uint1_t pipeline_in_valid;
uint32_t pipeline_out_data;
uint1_t pipeline_out_valid;
// The autopipelined function (no state)
void the_pipeline()
{
    pipeline_out_data = pipeline_in_data + 1;
    pipeline_out_valid = pipeline_in_valid;
}

// FIFO for holding outputs from pipeline
// Pipeline shouldn't be more than 128 cycles latency/deep
#define FIFO_DEPTH 128
FIFO_FWFT(outfifo, uint32_t, FIFO_DEPTH) // 128 deep of u32s

// Stateful FIFO + counting + AXIS signaling
DECL_OUTPUT(uint32_t, out_data)
DECL_OUTPUT(uint1_t, out_valid)
DECL_OUTPUT(uint1_t, in_ready)
void io_and_ctrl(uint32_t in_data, uint1_t in_valid, uint1_t out_ready)
{
    // Keep count of how many words in FIFO
    static uint16_t fifo_count;

    // Default not ready for inputs
    in_ready = 0;

    // Signal ready for input if room in fifo
    if(fifo_count < FIFO_DEPTH){
        in_ready = 1;
    }

    // Logic for input side of pipeline
    // Gate valid with ready signal
    pipeline_in_valid = in_valid & in_ready;
    pipeline_in_data = in_data;

    // Free flow of data out of pipeline into fifo
    uint1_t fifo_wr_en = pipeline_out_valid;
    uint32_t fifo_wr_data = pipeline_out_data;
    // Dont need to check for not full/overflow since count used for ready

    // Read side of FIFO connected to top level outputs
    uint1_t fifo_rd_en = out_ready;

    // The FIFO instance connected to outputs
    outfifo_t fifo_o = outfifo(fifo_rd_en, fifo_wr_data, fifo_wr_en);
    out_valid = fifo_o.data_out_valid;
    out_data = fifo_o.data_out;

    // Count input writes and output reads from fifo
    if(in_valid & in_ready){
        fifo_count += 1;
    }
    if(out_valid & out_ready){
        fifo_count -= 1;
    }
}
JulianKemmerer commented 2 years ago

Whoops :facepalm:

JulianKemmerer commented 2 years ago

Feels similar to 'these wires are not normal autopipelined w/ registers wires' like https://github.com/JulianKemmerer/PipelineC/issues/65 or https://github.com/JulianKemmerer/PipelineC/issues/66 might be getting at...

JulianKemmerer commented 2 years ago

You can think of the inputs (clock enables, resets, feedback ready signals) being sampled during the pipelines first cycle And the outputs being driven from the last cycle The same logic applies to global vars/wires read and driven by the function In comb logic the 'first cycle' and 'last cycle' are the same thing - so it works But for pipelines its stretched over n cycles and inputs are no longer time aligned with outputs

JulianKemmerer commented 2 years ago

Could maybe do something like

// A global wire that is the clock enable
// for the entire axis_test pipeline
uint1_t the_clock_enable;

typedef struct axis_test_t{
  uint32_t axis_out_data;
  uint1_t axis_out_valid;
}axis_test_t;
#pragma MAIN axis_test
#pragma MAIN_CLOCK_ENABLE axis_test the_clock_enable
axis_test_t axis_test(
  uint32_t axis_in_data,
  uint1_t axis_in_valid
)
{
  // Simple demo doing +1 on each element
  axis_test_t o;
  o.axis_out_data = axis_in_data + 1;
  o.axis_out_valid = axis_in_valid;
  return o;
}

#pragma MAIN the_ctrl
void the_ctrl()
{
 ... drive the_clock_enable based on axis tready each clock
}

But I dont like that its something in global ~MAIN scope Not something that you can re-instance like simply calling axis_test() again