GaloisInc / dismantle

A library of assemblers and disassemblers derived from LLVM TableGen data
24 stars 5 forks source link

Exhausted file descriptors when building the ARM32 disassembler #36

Closed travitch closed 6 months ago

travitch commented 2 years ago

The ARM disassembler loads all of the XML files from the ASL specs at build time. We've had reports of the compilation running out of file descriptors at build time. We should investigate. It could be due to lazy IO, and we may need to be much more careful about closing files.

RyanGlScott commented 9 months ago

This error is especially common when building with AArch64 Macs. If you encounter this error, try running ulimit -n 10240 to increase the number of file descriptors before building.

RyanGlScott commented 6 months ago

The error message typically looks like:

    Exception when trying to run compile-time code:
      data/ISA_v85A_AArch32_xml_00bet9/stlh.xml: openBinaryFile: resource exhausted (Too many open files)
RyanGlScott commented 6 months ago

I can reproduce this on Linux by running:

$ ulimit -n 256 # I believe this is the default on AArch64 Macs
$ cabal build dismantle-arm-xml
RyanGlScott commented 6 months ago

The following patch fixes the issue:

diff --git a/dismantle-arm-xml/src/Dismantle/ARM/TH.hs b/dismantle-arm-xml/src/Dismantle/ARM/TH.hs
index 6368db2..6caefd0 100644
--- a/dismantle-arm-xml/src/Dismantle/ARM/TH.hs
+++ b/dismantle-arm-xml/src/Dismantle/ARM/TH.hs
@@ -31,6 +31,7 @@ import qualified Codec.Compression.GZip as CCG
 import qualified Control.Monad.Fail as Fail
 import qualified Data.Binary as DB
 import qualified Data.BitMask as BM
+import qualified Data.ByteString as BS
 import qualified Data.ByteString.Lazy as LBS
 import qualified Data.List as List
 import qualified Data.Map as M
@@ -75,7 +76,7 @@ genISA isa xmldirPath encIndexFile aslInstrs parseTables logFile = do
     armISADesc isa xmldirPath encIndexFile aslInstrs handle
   TH.runIO $ putStrLn "Successfully generated ISA description."
   aslMapDesc <- mkASLMap encodingops
-  inputBytes <- TH.runIO $ mapM LBS.readFile files
+  inputBytes <- TH.runIO $ mapM (fmap LBS.fromStrict . BS.readFile) files
   let hash = DTL.computeHash inputBytes
   let asMaybeString :: DB.Get (Maybe String)
       asMaybeString = DB.get

There are 593 files under dismantle-arm-xml/data/ISA_v85A_AArch32_xml_00bet9 alone, and lazily reading from each file (as LBS.readFile does) in this function will quickly exceed a file descriptor limit of 256. This patch causes GHC to read the entire contents of each file into memory and close it before moving on to the next file, which avoids having too many files open.

There is another call to LBS.readFile here:

https://github.com/GaloisInc/dismantle/blob/1599d9707498bff1fee9ed1a37804675c79eee08/dismantle-arm-xml/src/Dismantle/ARM/TH.hs#L82

But it isn't necessary to make this readFile strict in order to fix the issue. In fact, doing so might incur a significant amount of extra space when compiling the library, since the parseTables file is quite a bit larger than the other 593 files above. (dismantle-arm-xml/data/Parsed/arm_instrs.sexpr, which is one particular instantiation of parseTables, is 22M in size, whereas most of the other 593 files are around 15K apiece.)