averycrespi / yolk

Numerical computing for Yolol
https://yolk.crespi.dev
Apache License 2.0
1 stars 0 forks source link

Parallelism Proposal #78

Open averycrespi opened 5 years ago

averycrespi commented 5 years ago

TL;DR

Refocus Yolk on pipelined, parallel computation.

Abstract

Yolk currently focuses on single-chip numerical computing. This proposal would allow Yolk to seamlessly spread its Yolol output over n chips, making use of m data fields as registers. A scheduler would be implemented to manage statement execution.

Scheduling Algorithm

1) Build a DAG of variable dependencies. Yolk's SSA form ensures that the dependency graph is acyclic.

2) Linearize the DAG into stages. Statements within the same stage may be executed in any order.

3) Assign statements to chips. All statements in stage n must execute before any statements in stage n+1. The assignment algorithm will minimize the following criteria, in order:

1) Maximum number of lines on a single chip

2) Number of chips used

3) Number of register saves and loads

4) Add register saves and loads to share values between stages. Register positions have already been marked by the previous step.

5) Map stages to physical lines on chips. Some stages may take more than one physical line. Insert // Sync comments on empty lines.

averycrespi commented 5 years ago

Currently not possible due to non-determinism within a network. Screenshot_20190905_174342

Azurethi commented 4 years ago

Personally I think this "bug" should be left in as it creates a challenge to add your own synchronisation methods. Also, having forced synchronisation in systems where it's not needed would be a waste of process time.

It's not a bug, It's a feature!

averycrespi commented 4 years ago

Yolk focuses on performant vectorized code, so adding synchronization checks would be an unreasonably high overhead. For this particular kind of supercomputing, line determinism is extremely important.

Other approaches may suit non-deterministic systems better. For example, mapping independent function units onto chips.