calyxir / calyx

Intermediate Language (IL) for Hardware Accelerator Generators
https://calyxir.org
MIT License
453 stars 45 forks source link

Resource estimator is producing segfault #1174

Closed susan-garry closed 1 year ago

susan-garry commented 1 year ago

Running

fud e --to resource-estimate examples/futil/dot-product.futil

from inside from the root directory of the calyx repo produces a segfault. Attached is the error log.

re-error.log

I have run cd fud && flit install -s --deps all and tried scrubbing my local copy of the calyx repository and re-cloning it, but to no avail. Also, @calebmkim has tried these command on his setup and was unable to reproduce the error. Any ideas what the underlying issue could be?

sampsyo commented 1 year ago

Huh, I'm interested in trying to reproduce the problem, but these errors really jumped out at me:

ERROR: [#UNDEF] /scratch/opt/Xilinx/Vivado/2020.2/scripts/rt/data/unisim_comp.lib:17972, The bus_typeØd‘ã attribute is not allowed in the bus group in this context. (Syntax Error Encountered)
ERROR: [#UNDEF] /scratch/opt/Xilinx/Vivado/2020.2/scripts/rt/data/unisim_comp.lib:17971, The 'bus_type' attribute is missing in bus 'IFPSCPMCHANNEL9XPIPETXPOSTCURSOR'. (Syntax Error Encountered)

Namely, Vivado seems to be saying:

I expected an attribute called bus_type here. I didn't find that, but I sure did find an attribute called bus_typeØd‘ã.

…which is, uh, interesting.

Is there any chance this could be related to some filesystem or other OS-related weirdness? Like, randomly, could it be running on an sshfs mount on a Windows machine that's using a different encoding??? Anything at all in that ballpark, involving transferring files from your client machine to havarti?

rachitnigam commented 1 year ago

@susan-garry can you provide us with your fud configuration and how exactly you're using the tool? For example, are you running this command on a server (gorgonzola/havarti)? I think @sampsyo's diagnosis is correct–something is going wrong in the file encoding step

susan-garry commented 1 year ago

I am running the commands on havarti, and here is my fud configuration:

config.txt

Running fud check verifies that vivado is installed.

Additionally, I have not transferred any files onto havarti, and I'm not (knowingly) using an sshfs mount...

rachitnigam commented 1 year ago

Can you try deleting the calyx repo and starting over with freshly installed tools?

susan-garry commented 1 year ago

Yes, that is what I meant when I said that I scrubbed the repo and re-cloned it. Sorry I wasn't more clear

rachitnigam commented 1 year ago

Hm, that's troubling because I don't know how to reproduce it. I'm currently setting up remote tools myself and running the same command to see if it works but it seems like something else is going wrong. One other thing to try is using gorgonzola instead of havarti and seeing if that works

sampsyo commented 1 year ago

Just gave it a shot here too, and everything seemed to be in order. This is absolutely crazy, so I also sudo'd to impersonate @susan-garry on havarti and, to my utter shock, the error still didn't occur!! It seems like there must be some uncontrolled variable here.

To document in excruciating detail, here's what I did, according to my tmux scrollback:

  1. sudo -u shg64 -s to impersonate Susan
  2. cd /scratch/susan/calyx
  3. fud e -vv --to resource-estimate examples/futil/dot-product.futil

That's literally it; I didn't even export any environment variables for Vivado (looks like Susan's .bashrc has the right source incantations).

I know this is crazy, @susan-garry, but can you give this a shot "from scratch" in a newly-spawned shell?

susan-garry commented 1 year ago

@sampsyo I tried creating a new shell on my computer and signing into havarti again from the new shell, but the error still occurs. However, I know this sounds crazy, but I executed the command a second time so that I could see the full error message to make sure they matched, and the second time I executed the command, no error occurred. I executed the same command 3 more times and the error came back for each of the subsequent executions.

For completeness, here are the commands that I ran:

  1. cd /scratch/susan/calyx
  2. fud e -vv --to resource-estimate examples/futil/dot-product.futil
  3. fud e -vv --to resource-estimate examples/futil/dot-product.futil 2> re-error.log (no error) 4-6. fud e -vv --to resource-estimate examples/futil/dot-product.futil 2> re-error.log (with error)

Edit: After running rm re-error.log and trying the last command again (to be clear, I ran the exact same command), I got partial results. The error log still indicated that a segfault occurred, but it didn't produce the bus_type errors. In case it becomes useful, here is the new error log:

re-error2.log

Additionally, does this seem like a reasonable output for the command? (I'm curious if the resource estimate was able to run without any issue).

{
  "lut": 193,
  "dsp": 3,
  "meet_timing": 1,
  "registers": 18,
  "muxes": 20,
  "clb_registers": 102,
  "carry8": 6,
  "f7_muxes": 0,
  "f8_muxes": 0,
  "f9_muxes": 0,
  "clb": 301,
  "cell_lut1": -1,
  "cell_lut2": 151,
  "cell_lut3": 40,
  "cell_lut4": 66,
  "cell_lut5": 10,
  "cell_lut6": 17,
  "cell_fdre": 102
}

I will give gorgonzola a go; I'm just having a bit of trouble setting up all of the fud dependencies, but I will work on this in the next few days.

Also, now that I've thought about it more, I did download files for the odgi python binding from anaconda, and may have copied them onto the server (as opposed to using something like curl). Could this somehow be affecting vivado?

sampsyo commented 1 year ago

Wow!!!!!! That is so incredibly weird!!!! Now I want to try this again, just to see if it fails/succeeds nondeterministically. I can't emphasize enough—this is truly strange, dark-matter stuff going on here.

That output does indeed look pretty reasonable. Not sure why cell_lut1 has the value -1, but generally speaking, the bottom-line lut and dsp numbers look exactly right for something as small as this.

rachitnigam commented 1 year ago

@susan-garry if this is resolved, can we close the issue?

susan-garry commented 1 year ago

I think that I was able to circumvent the issue by sshing into gorgonzola to using vivado, as I can now get resource estimates for both of the files I was using before (examples/futil/dot-product.futil and a generated node depth accelerater).

So the underlying issue is not solved, but a solution has been found. Is that good enough to warrant closing this issue (for now at least)?

rachitnigam commented 1 year ago

Yeah, I think so since we can’t reproduce it