f4pga / f4pga-arch-defs

FOSS architecture definitions of FPGA hardware useful for doing PnR device generation.
https://f4pga.org
ISC License
270 stars 112 forks source link

CI is broken #1768

Closed HackerFoo closed 3 years ago

HackerFoo commented 3 years ago

It looks like the build has been broken since https://github.com/SymbiFlow/symbiflow-arch-defs/pull/1760

HackerFoo commented 3 years ago

Simple designs such as counter and buttons are unroutable. It could be a problem with VPR.

mithro commented 3 years ago

@litghost / @acomodi - Can you please look into this?

acomodi commented 3 years ago

@HackerFoo you have seen this for the vtr+Symbiflow CI, correct? In that case this PR solves the issue: https://github.com/SymbiFlow/symbiflow-arch-defs/pull/1711. I have forced a Kokoro run as there were some infrastructure failures.

litghost commented 3 years ago

If you are speaking about the vtr+Symbiflow, that is a result of in progress master+wip updates. To be clear, there are 3 CI's on arch-defs, Travis, kokoro vtr+Symbiflow and kokoro arch-defs. Both Travis and kokoro arch-defs should both be green right now.

HackerFoo commented 3 years ago

Then VtR+SymbiFlow doesn't need to pass?

The top level "Architecture Definitions (Presubmit)" is also failing on https://github.com/SymbiFlow/symbiflow-arch-defs/pull/1735, but I think it's because of the QuickLogic test.

Another strange thing is that several of the tests have not yet completed for the last 2 PRs.

litghost commented 3 years ago

The top level "Architecture Definitions (Presubmit)" is also failing on #1735, but I think it's because of the QuickLogic test.

If you rebase onto master, then the QuickLogic test will pass trivially. Kokoro should be running as "presubmit as merged", but because it is part of the kokoro configuration it may be more restrictive.

Then VtR+SymbiFlow doesn't need to pass?

No. VtR + SymbiFlow is a CI that is intended to show that the current master+wip on https://github.com/symbiflow/vtr-verilog-to-routing or master on https://github.com/verilog-to-routing/vtr-verilog-to-routing/ doesn't work with the particular revision of https://github.com/SymbiFlow/symbiflow-arch-defs/ . Given that upstream master VTR and master+wip VTR can make changes that manifest issues with arch-defs (e.g. XML changes or https://github.com/verilog-to-routing/vtr-verilog-to-routing/issues/1571 ), it is possible for the VtR + SymbiFlow CI to be red as expected. The signal is to investigate why the newer version of VTR no longer works. In this particular case, changes in how the virtual RR graph was generated (because of upstream PR https://github.com/verilog-to-routing/vtr-verilog-to-routing/pull/1448) are why the CI is red. @acomodi has filed issues about this and has a PR working on the solution to the change.

This is why arch-defs uses environment.yml to decouple updates to yosys and VTR.

The key here is the what the various CI's are for.

  1. Travis and kokoro arch-defs test whether the current revision of arch-defs works. These are the CI's to pay attention too and make sure are green for the purposes of PR review and checking the health of master.
  2. kokoro VtR master+wip + SymbiFlow and kokoro VtR master + SymbiFlow test whether the upcoming versions of VtR work with master arch-defs. Ideally this stays green, but it will go red when an incompatible change arises on upstream. They provide early signals of integration issues. I believe @acomodi for example discovered https://github.com/verilog-to-routing/vtr-verilog-to-routing/issues/1571 from the VtR master+wip + SymbiFlow CI.
GitHub
SymbiFlow/vtr-verilog-to-routing
SymbiFlow WIP changes for Verilog to Routing -- Open Source CAD Flow for FPGA Research - SymbiFlow/vtr-verilog-to-routing
GitHub
verilog-to-routing/vtr-verilog-to-routing
Verilog to Routing -- Open Source CAD Flow for FPGA Research - verilog-to-routing/vtr-verilog-to-routing
GitHub
SymbiFlow/symbiflow-arch-defs
FOSS architecture definitions of FPGA hardware useful for doing PnR device generation. - SymbiFlow/symbiflow-arch-defs
litghost commented 3 years ago

Another strange thing is that several of the tests have not yet completed for the last 2 PRs.

Do you mean that you don't see the kokoro arch-defs CI on the latest master commits? That is because the previous continuous job is still running. I investigated it, and it due to a routing job taking a while to completed.

HackerFoo commented 3 years ago

When I check the merge details for #1762 or #1760, there are several either failed or "Tool failed" tests.

litghost commented 3 years ago

When I check the merge details for #1762 or #1760, there are several either failed or "Tool failed" tests.

I see that too, but if you look at the failing tests, are passing at the script level. For example:

https://source.cloud.google.com/results/invocations/8ecd20df-54dc-4d60-820f-20a9b7224a24/log

[ID: 7673017] Build finished after 17563 secs, exit value: 0

Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
[14:24:56] Collecting build artifacts from build VM
[14:27:05] Kokoro builder finished

We've been seeing some issues around artifact collection, so I suspect a Kokoro hiccup on those.

Develop your code on the Google Cloud Platform.
litghost commented 3 years ago

For reference, the master xc7 build for #1760 is green here: https://source.cloud.google.com/results/invocations/ba1043e9-80b6-4508-ac34-3bc488cc96f6/targets

Develop your code on the Google Cloud Platform.
litghost commented 3 years ago

When I check the merge details for #1762 or #1760, there are several either failed or "Tool failed" tests.

I see that too, but if you look at the failing tests, are passing at the script level. For example:

https://source.cloud.google.com/results/invocations/8ecd20df-54dc-4d60-820f-20a9b7224a24/log

[ID: 7673017] Build finished after 17563 secs, exit value: 0

Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
[14:24:56] Collecting build artifacts from build VM
[14:27:05] Kokoro builder finished

We've been seeing some issues around artifact collection, so I suspect a Kokoro hiccup on those.

So the issue is likely that Kokoro hit out of disk when moving the workspace for artifact collection. Because kokoro applies the collection filtering after copying all files from the workspace (don't ask me), the addition of RapidWright and some more third_party submodules, we likely were scrapping the upper disk limit. This neatly explains why it is intermittent, because the working directory size may vary.

HackerFoo commented 3 years ago

The "Xilinx Series 7 - Install (Presubmit)" test is now failing with:

ChecksumMismatchError: Conda detected a mismatch between the expected content and downloaded content
for url 'https://conda.anaconda.org/litex-hub/linux-64/gcc-riscv64-elf-nostdc-9.2.0-20200923_200922.tar.bz2'.
  download saved to: /tmpfs/src/github/symbiflow-arch-defs-presubmit-install/env/downloads/conda-pkgs/gcc-riscv64-elf-nostdc-9.2.0-20200923_200922.tar.bz2
  expected md5: 98bd7a66867ab138fdb104c53df1db44
  actual md5: 1e303a894909bad44b81fb1910adff22
pgielda commented 3 years ago

I've just downloaded this file manually

# md5sum gcc-riscv64-elf-nostdc-9.2.0-20200923_200922.tar.bz2 
98bd7a66867ab138fdb104c53df1db44  gcc-riscv64-elf-nostdc-9.2.0-20200923_200922.tar.bz2

It seems to have the expected md5

HackerFoo commented 3 years ago

Now I get this from architecture presubmit tests:

Traceback (most recent call last):
  File "/tmpfs/src/github/symbiflow-arch-defs-presubmit-install/xc/common/utils/prjxray_routing_import.py", line 1566, in <module>
    main()
  File "/tmpfs/src/github/symbiflow-arch-defs-presubmit-install/xc/common/utils/prjxray_routing_import.py", line 1534, in main
    node_remap = create_node_remap(capnp_graph.graph.nodes, channels_obj)
  File "/tmpfs/src/github/symbiflow-arch-defs-presubmit-install/xc/common/utils/prjxray_routing_import.py", line 1333, in create_node_remap
    coord = tuple(hilbert_curve.coordinates_from_distance(h))
AttributeError: 'HilbertCurve' object has no attribute 'coordinates_from_distance'
HackerFoo commented 3 years ago

It looks like that dependency just updated breaking compatibility: https://pypi.org/project/hilbertcurve/#history Ugh.

PyPI
hilbertcurve
Construct Hilbert Curves.
HackerFoo commented 3 years ago

The vendor tool test also fails with:

/tmpfs/src/github/symbiflow-arch-defs-presubmit-xc7-vendor/env/RapidWright/bin/rapidwright_classpath.sh was not found, check if RapidWright has been built.

link

litghost commented 3 years ago

The vendor tool test also fails with:

/tmpfs/src/github/symbiflow-arch-defs-presubmit-xc7-vendor/env/RapidWright/bin/rapidwright_classpath.sh was not found, check if RapidWright has been built.

link

I've been on the lookout for this cropping up. The relevant PR on the upstream RapidWright repo was merged before the closed portion of RapidWright was updated. I've asked Xilinx to cut a new release of the closed portion to fix this, but that won't be for a bit. I've proposed tracking a fork of RapidWright for now: https://github.com/SymbiFlow/symbiflow-arch-defs/pull/1772

litghost commented 3 years ago

Continuous CI has been green for the last 6 runs, closing.