cucapra / pollen

generating hardware accelerators for pangenomic graph queries
MIT License
24 stars 1 forks source link

Possibly verbose behavior in our .data files #24

Closed anshumanmohan closed 1 year ago

anshumanmohan commented 1 year ago

In our .data JSON files, we create a path_ids{i} entry and a paths_to_consider{i} entry for every node in the graph. This makes sense to me.* However, we also create these for nodes not in the graph. For instance, the graph note5.gfa has four nodes, but we create path_ids{1} through to path_ids{16} because 16 is a magic-number constant. Just wondering if this is a slip-up.

@susan-garry please let me know, and I'll make the fixes if appropriate!

*sidebar: my understanding is that the paths_to_consider entry will be identical for all the nodes, but we want this duplication for performance reasons.

anshumanmohan commented 1 year ago

Once this is sorted out, I'll add flags to the JSON-generating python script and deal with with larger graphs that exceed size 16!

susan-garry commented 1 year ago

Generating path_ids and paths_to_consider for nodes not in a graph is an intentional, if sloppy, design decision. The accelerator generator needs some default parameters to determine its size, and I just set 16 (chosen pretty much at random) as the default.

I also assumed that we could reuse the same accelerator for multiple inputs of varying sizes, but that all cells marked @external need to be initialized. This would mean that input data for graphs whose dimensions are smaller than the accelerator's must necessarily to be padded with 0s. Running some tests now, it seems that verilator and the calyx interpreter don't need every cell marked as @external to be specified, but icarus verilog produces weird behavior unless all path_id and paths_to_consider arrays are initialized. For example, sometimes it fails to produce output for nodes for which path_id{i} and paths_to_consider{i} are defined, and outputs x instead.

I don't know how this affects verilator, but I think the accelerators should be able to work with all of the calyx simulators.

anshumanmohan commented 1 year ago

Update after meeting: this is okay! Not overly verbose at all, but what the hardware needs.

In the distant future, we can imagine splitting this into two steps:

  1. Generate just what the graph means
  2. Pad with 0s, make duplicates of lists, etc.

But that's really just a nice-to-have and it's not even clear that it helps anyone. It's not as though the .data will be particularly readable after step 1.

At the end of the day, we do need the full monty.