Xtra-Computing / ThunderGP

HLS-based Graph Processing Framework on FPGAs
Apache License 2.0
140 stars 33 forks source link

Possible bug with specific application, CModel gives correct results #9

Closed pf-un closed 3 years ago

pf-un commented 3 years ago

I believe there may be a bug running this specific application, which is meant to double the property of all vertices (via edge properties).

When executing a superstep, the vertex properties remain unaltered. CModel reports what I would expect to be the correct result, but additionally seems to think the vertex properties are all 0 when they are not (since they are unaltered!).

Note that I set the vertex property to i+1 (so vertex 0 has property 1).

See below the results of a run. After the CModel verification I print the first 10 entries for MEM_ID_PUSHIN_PROP and MEM_ID_PUSHIN_PROP_MAPPED. Note that some vertices are also missing from MEM_ID_PUSHIN_PROP_MAPPED as per #8 but this is not the issue at hand here. None of the vertices have had their property doubled.

I am running on a U250 and using SDAccel 2018.3.

$ ./host_graph_fpga_dc1 xclbin_dc1/*.xclbin dataset/rmat-14-32.txt 
Found 1 platforms!
Found 1 devices!
file is xclbin_dc1/graph_fpga.hw.xilinx_u250_xdma_201830_2.xclbin xclbin_dc1/graph_fpga.hw.xilinx_u250_xdma_201830_2.xclbin 
INFO: Importing xclbin_dc1/graph_fpga.hw.xilinx_u250_xdma_201830_2.xclbin
INFO: Loaded file
INFO: Created Binary
INFO: Built Program
Graph dataset/rmat-14-32.txt is loaded.
vertex num: 16384
edge num: 524288

unpad edge_tuple_range 524288
ratio 12501 / 16384 is 0.763000 
ratio 12501 / 16384 is 0.763000 
ratio 12501 / 16384 is 0.763000 
ratio 12501 / 16384 is 0.763000 
[EST]: 1 is expected to exe in 0.401833ms
[EST]: 0 is expected to exe in 0.401825ms
[EST]: 2 is expected to exe in 0.401783ms
[EST]: 3 is expected to exe in 0.401734ms
[EST]: finalOrder 3 total exe: 0.401734ms
[EST]: finalOrder 2 total exe: 0.401783ms
[EST]: finalOrder 0 total exe: 0.401825ms
[EST]: finalOrder 1 total exe: 0.401833ms
[SIZE] 526336 cur_edge_num sub 131072
[SIZE] 526336 cur_edge_num sub 131072
[SIZE] 526336 cur_edge_num sub 131072
[SIZE] 526336 cur_edge_num sub 131072

----------------------------------------------------------------------------------
[PART] subPartitions 0 info :
[PART]   edgelist from 0 to 131072
[PART]   dst. vertex from 0 to 12500
[PART]   src. vertex from 9600 to 12500
[PART]   dump: 9600 - 12500
[PART] scatter cache ratio 0.000000 
[PART] v/e 0.095367 
[PART] v: 12500 e: 131072 
[PART] est. efficient 10.484921
[PART] compressRatio 0.763000 

[SCHE] 0 with 526336 @ 0 
transfer base mem start
transfer base mem
transfer subPartitions mem
transfer cu mem
data transfer 5.678447 
cmodel error 0 0x00000004 hw: 0x00000000  diff 0x00000004 !!!!
cmodel error 1 0x00000006 hw: 0x00000000  diff 0x00000006 !!!!
cmodel error 2 0x0000000a hw: 0x00000000  diff 0x0000000a !!!!
cmodel error 3 0x0000000e hw: 0x00000000  diff 0x0000000e !!!!
cmodel error 5 0x00000012 hw: 0x00000000  diff 0x00000012 !!!!
cmodel error 6 0x00000016 hw: 0x00000000  diff 0x00000016 !!!!
cmodel error 7 0x00000018 hw: 0x00000000  diff 0x00000018 !!!!
cmodel error 8 0x0000001a hw: 0x00000000  diff 0x0000001a !!!!
cmodel error 9 0x0000001c hw: 0x00000000  diff 0x0000001c !!!!
cmodel error 10 0x0000001e hw: 0x00000000  diff 0x0000001e !!!!
cmodel error 11 0x00000020 hw: 0x00000000  diff 0x00000020 !!!!
cmodel error 12 0x00000022 hw: 0x00000000  diff 0x00000022 !!!!
cmodel error 15 0x0000002a hw: 0x00000000  diff 0x0000002a !!!!
cmodel error 16 0x0000002c hw: 0x00000000  diff 0x0000002c !!!!
cmodel error 17 0x0000002e hw: 0x00000000  diff 0x0000002e !!!!
cmodel error 19 0x00000032 hw: 0x00000000  diff 0x00000032 !!!!
cmodel error 20 0x00000034 hw: 0x00000000  diff 0x00000034 !!!!
cmodel error 22 0x0000003a hw: 0x00000000  diff 0x0000003a !!!!
cmodel error 23 0x0000003c hw: 0x00000000  diff 0x0000003c !!!!
cmodel error 24 0x00000040 hw: 0x00000000  diff 0x00000040 !!!!
cmodel error 26 0x00000044 hw: 0x00000000  diff 0x00000044 !!!!
cmodel error 27 0x00000048 hw: 0x00000000  diff 0x00000048 !!!!
cmodel error 28 0x0000004a hw: 0x00000000  diff 0x0000004a !!!!
cmodel error 29 0x0000004c hw: 0x00000000  diff 0x0000004c !!!!
cmodel error 31 0x00000052 hw: 0x00000000  diff 0x00000052 !!!!
cmodel error 32 0x00000054 hw: 0x00000000  diff 0x00000054 !!!!
cmodel error 34 0x00000058 hw: 0x00000000  diff 0x00000058 !!!!
cmodel error 35 0x0000005a hw: 0x00000000  diff 0x0000005a !!!!
cmodel error 36 0x0000005c hw: 0x00000000  diff 0x0000005c !!!!
cmodel error 37 0x0000005e hw: 0x00000000  diff 0x0000005e !!!!
cmodel error 38 0x00000060 hw: 0x00000000  diff 0x00000060 !!!!
cmodel error 39 0x00000062 hw: 0x00000000  diff 0x00000062 !!!!
cmodel error 40 0x00000064 hw: 0x00000000  diff 0x00000064 !!!!
cmodel error 41 0x00000068 hw: 0x00000000  diff 0x00000068 !!!!
cmodel error 42 0x0000006a hw: 0x00000000  diff 0x0000006a !!!!
cmodel error 43 0x0000006c hw: 0x00000000  diff 0x0000006c !!!!
cmodel error 44 0x0000006e hw: 0x00000000  diff 0x0000006e !!!!
cmodel error 45 0x00000072 hw: 0x00000000  diff 0x00000072 !!!!
cmodel error 47 0x0000007a hw: 0x00000000  diff 0x0000007a !!!!
cmodel error 48 0x0000007c hw: 0x00000000  diff 0x0000007c !!!!
cmodel error 50 0x00000080 hw: 0x00000000  diff 0x00000080 !!!!
cmodel error 51 0x00000082 hw: 0x00000000  diff 0x00000082 !!!!
cmodel error 52 0x00000084 hw: 0x00000000  diff 0x00000084 !!!!
cmodel error 53 0x00000086 hw: 0x00000000  diff 0x00000086 !!!!
cmodel error 54 0x00000088 hw: 0x00000000  diff 0x00000088 !!!!
cmodel error 55 0x0000008c hw: 0x00000000  diff 0x0000008c !!!!
cmodel error 56 0x0000008e hw: 0x00000000  diff 0x0000008e !!!!
cmodel error 57 0x00000090 hw: 0x00000000  diff 0x00000090 !!!!
cmodel error 58 0x00000092 hw: 0x00000000  diff 0x00000092 !!!!
total cmodel error: 11352
Property for "first 10" vertices:
1
2
3
4
5
6
7
8
9
10
Number of vertices with non-null property:16384

Property for "first 10" vertices:
2
3
5
7
8
9
11
12
13
14
pf-un commented 3 years ago

Just to be clear, acceleratorQueryProperty(0) gives the same results as MEM_ID_PUSHIN_PROP_MAPPED.

pf-un commented 3 years ago

I just found out that if I run acceleratorCModelSuperStep() twice I get non-null values for hw:

    acceleratorSuperStep(0, &graphDataInfo);
    acceleratorCModelSuperStep(0, &graphDataInfo);
    acceleratorCModelSuperStep(1, &graphDataInfo);
cmodel error 0 0x00000008 hw: 0x00000002  diff 0x00000006 !!!!
cmodel error 1 0x0000000c hw: 0x00000003  diff 0x00000009 !!!!
cmodel error 2 0x00000014 hw: 0x00000005  diff 0x0000000f !!!!
cmodel error 3 0x0000001c hw: 0x00000007  diff 0x00000015 !!!!
cmodel error 4 0x00000000 hw: 0x00000008  diff 0xfffffff8 !!!!
cmodel error 5 0x00000024 hw: 0x00000009  diff 0x0000001b !!!!
[...]

So this might be an issue with the ping-pong mechanism, no? Regardless the issue of the vertex properties not doubling remains.

pf-un commented 3 years ago

Actually, regarding my last comment, vertex 7 (with value 8) has value 0 in the CModel, when it should be 8*2*2=0x20. I checked the dataset and 7 has in-degree 0, which might explain that. Nonetheless this is a different issue.

pf-un commented 3 years ago

@HongshiTan, I've uploaded a compressed archive of the xclbin folder here.

I also tried:

inline prop_t scatterFunc(prop_t srcProp, prop_t edgeProp)
{
    return (edgeProp);
}

inline prop_t gatherFunc(prop_t ori, prop_t update)
{
    return (update);
}

and

inline prop_t scatterFunc(prop_t srcProp, prop_t edgeProp)
{
    return 0+edgeProp;
}

inline prop_t gatherFunc(prop_t ori, prop_t update)
{
    return 0+update;
}

The results are the same.

HongshiTan commented 3 years ago

After analyzing your code and the simulation waveform, I think it is the problem of the gatherFunc. Can I confirm with you that you are going to gather the latest update (only one update from the neighbour) of a vertex rather than the sum of all the update from its neighbour?

pf-un commented 3 years ago

Exactly. For this test, I'm only interested in the last update/last neighbour to be processed.

The expected outcome is to double the property of each vertex, so I set all edge properties to 2 and only propagate the edge properties during the scatter phase.

HongshiTan commented 3 years ago

OK, I changed the gatherFunc as follows (that may not be the final solution):

inline prop_t gatherFunc(prop_t ori, prop_t update) { return update>ori? update:ori; }

The problem comes from the HLS compiler, if ori is not used, the entire function and the logic that relies on this function would be over-simplification by LLVM, which leads to a wrong result (e.g, the update is directly connected to the initial register which is zero)

the above code is only applied to the cases that all edge property is the same value.

Some details you might be interested in: In fpga_application.h, you can see the gatherFunc is used in three modules: RAW solver, gather, and property merging. The compiler can not handle your code well in the RAW solver and property merging modules.

pf-un commented 3 years ago

I suspected as much but didn't come up with an adequate test for the scenario unfortunately! Thanks for looking into it. For future reference: beware of over-optimisation/optimising out by the compiler!

P.S. I'll probably update #8 soon with an example application (SSSP) showing the discrepancy between vertex properties accessed via acceleratorQueryProperty and what I'd expect are the correct vertex properties -- hopefully you can have a look at that as well.

HongshiTan commented 3 years ago

OK, we can provide an API to access the property as you expected