Open yannickl96 opened 4 years ago
Yes, that would be awesome. However, we actually had exactly this implemented before and it turned out way too pessimistic. For instance, we couldn't build US+ bitstreams on our 32GB development machines.
If you can improve on that it would certainly be very useful. You might be able to find the old commits in the log.
Did you just do it similar to the naive calculation I proposed in my original post or did you do a smarter solution? Another solution could be to first do a run with batch size 1, look at the LUT/CLB utilization, since that is the main culprit of memory consumption and then increase the batch size according to the utilization and the estimated memory needed.
To be honest, I don't remember. The implementation back then was from Jens and I was just the one that made him delete it again...
I just checked for the commit and I fear they're lost by the big "squashening". It seems like Jens used actual numbers from the Lichtenberg:
commit 5029ff2f7851727f93a95f31e316c9fee5411193
Author: Jens Korinth <jk@esa.cs.tu-darmstadt.de>
Date: Thu Jul 6 18:18:30 2017 +0200
Remove resource check for memory
* memory usage estimates are based on numbers reported by the
Lichtenberg cluster, but are too conservative for normal users
* removed check for memory entirely
I can give you access to the old TPC repository if you're interested.
I don't have time right now, but maybe @tkay94 can take a look when he's finished his current task.
I doubt it's possible to get an estimation which is precise enough that it's useful. From my observations (mainly US+ devices) the memory consumption does not only depend on the number of LUT/CLB but only changes drastically on a lot of other factors. Just to give an example: Increasing the "freedom" of Vivado for placing and routing (e.g. by adding Registers to some connections) seems to have a big impact on the memory usage. I think there are way to many factors like this with a big impact to reliably estimate the memory usage. And you'd probably also need to tune this for every supported platform...
@jojowi But wouldn't you be able to get this information by monitoring the memory usage of one run with Vivado configured to the DOFs that you mentioned? I think the main difficulty here would be to just aggregate the memory consumption of Vivado and all its child processes.
As Tapasco by default starts as many Vivado processes as there are CPU cores available
Another problem is that Tapasco does not consider hyper-threading. I opened an issue (#52), which was closed. I still think a sane default would be to only use physical cores.
This would also reduce the probability that people (who have never used the DSE from Tapasco before) crash servers, etc.
Well, another approach could be to monitor the free memory on the system and just let Tapasco kill Vivado (and cancel the current job) if it reaches a lower limit. At least that's what I did (using a Bash script) in order to avoid crashing machines :)
Guessing the memory usage of a Vivado run does look impossible to me. Especially once OOC synthesis is used (Vivado then also exceeds the limits listed on the Xilinx page)
You could probably get a worst case estimate like that (at least if you know all factors with an impact on memory usage). However the number you get by this will be much bigger than what you typically need. But an estimate which is reasonably precise is (in my opinion) not possible/feasible.
By using the information from https://www.xilinx.com/products/design-tools/vivado/memory.html it would be possible to implement a pessimistic estimate of how much parallel jobs a machine can handle for a DSE. As Tapasco by default starts as many Vivado processes as there are CPU cores available on the machine, it is quite easy to fill up the RAM when doing a DSE, especially on larger chips. This would also reduce the probability that people (who have never used the DSE from Tapasco before) crash servers, etc.
Implementation would be something like: Read available memory, get worst case memory per vivado instance. #jobs = available memory/worst case memory per job