accel-sim / accel-sim-framework

This is the top-level repository for the Accel-Sim framework.
https://accel-sim.github.io
Other
294 stars 113 forks source link

gpgpu_l1_cache_write_ratio deadlocks #139

Open rodhuega opened 1 year ago

rodhuega commented 1 year ago

This new config parameter can cause deadlocks. For example, it can be that gpgpu_l1_cache_write_ratio is set to 25 and all the lines of the cache are below this number. So never a line will be selected for replacement and always will have a reservation fail. So the core having this issue will never advance in the execution.

The affected lines are 288, 289, 294, 295 of gpu-cache.cc.

JRPan commented 1 year ago

So the idea is to prioritize clean cachelines over dirty lines. Because it's more expensive to replace a dirty line. This is probably a bug implement-wise. Some corner cases were not tested fully. Could you please share your workload? I'll take a look whenever I can. Right now, you can just set this config to 0 to basically disable this feature. You can also fix it if you have something in mind. I would be happy to review it and merge it.

Thanks, Junrui

rodhuega commented 1 year ago

I have been using the parameter to 0 since I found the bug. I can't share the workload because it is only happening in a proposal that is a work in progress. But, the behavior was happening in the Deepbench gemm tensor inference.

I can fix the bug in a few weeks when I have time.

What would be the idea? Prioritize what you say but if there isn't any line, select for example the first or last line found? If I'm not wrong the original behavior with the parameter with 0 would be the last line found because It does the full loop.

JRPan commented 1 year ago

After thinking more about this, I think the bug could be somewhere else? In the function, we iterate through all cachelines. First, we check if a line is reserved. If not, we then check if it's eligible to be replaced. If none of the not-reserved line is eligible, we treat it as reservation fail. Because no line can be replaced. The request has to wait. Then in some point in the future, the lines that are reserved should be freed. Then the cache iterate through all lines again, now the lines that were reserved should now be eligible to be replaced. Then a miss is returned, and the cache can continue progressing. Based on your description, it looks like the cache does not progress at all and deadlock. I'm wondering why the lines that were reservation fail are not cleared.

I could be completely wrong. Any comment is welcomed.

Thanks

rodhuega commented 1 year ago

But, it can be the case that you have a program that reaches a region of code that modifies a little all the lines because it has a pattern of stores that it does in that way and then all the lines have a small dirty_line_percentage. Then it reaches the request to have a new line and as all the lines have a smaller dirty_line_percentage than the m_wr_percent, it never chooses a candidate for being replaced.

JRPan commented 1 year ago

dirty_line_percentage is for the entire tag array, not per cacheline. It's tracking how many cachelines are dirty. This value is calculated by total dirty lines/total cachelines. total cachelines is calculated by set * associate. So if a lot of lines are dirty, then dirty_line_percentage should be big. Big enough that dirty_line_percentage >= m_config.m_wr_percent

rodhuega commented 1 year ago

Ohh, I see. If I remember well, I put a breakpoint after a lot of time running when I didn't see any warp progressing, and the problem was that I pointed in the first message. So, maybe is the case that you are explaining.

JRPan commented 1 year ago

Okay. So just to confirm, when using m_wr_percent=25 the simulator would deadlock? And then when setting it to 0 fix the deadlock?

rodhuega commented 1 year ago

Yes, that is the behavior

JRPan commented 1 year ago

Okay, I found the problem. I track total dirty cache lines on the bank level. But in some cases, a set can be completely dirty, while the total dirty lines fo the bank are below the threshold. Then this leads to no cacheline can be selected as victim to be replaced for this set, and always return reservation fails and the simulator can not progress at all.

I'm just documenting it. The solution would be to track the dirty ratio per set. If anyone is interested, you may go ahead and fix it. If not, I'll fix it ~next month and merge it into our next release.

Thanks, Junrui

rodhuega commented 1 year ago

I will wait for your fix. Thanks.