Dialyzer is not being installed intermittently

SamuelWillis commented 1 year ago

The bug

Our GitHub Actions intermittently failing due to dialyzer not being installed. This has started after we updated our Elixir version to 1.15.4 and OTP version to 26.0.2.

Running mix dialyzer --plt results in the following error:

DEPENDENCY MISSING
------------------------
If you are reading this message, then Elixir and Erlang are installed but the
Erlang Dialyzer is not available. Probably this is because you installed Erlang
with your OS package manager and the Dialyzer package is separate.

On Debian/Ubuntu:

  `apt-get install erlang-dialyzer`

Fedora:

   `yum install erlang-dialyzer`

Arch and Homebrew include Dialyzer in their base erlang packages. Please report a Github
issue to add or correct distribution-specific information.

Software versions

erlef/setup-beam@v1
Erlang/OTP OTP-26.0.2 - built on ubuntu-22.04

How to replicate

I haven't been able to reliably replicate as the bug is intermittent but it often seems to happen if two Actions run close together.

I've been trying to keep an eye on conditions that cause it to fail but no luck so far.

Expected behaviour

Dialyzer is installed when elixir & erlang is installed.

Additional context

If I delete all of the Action caches the Actions begin to work again for a while.

paulo-ferraz-oliveira commented 1 year ago

Is it possible you're sharing caches (in which e.g. you have and don't have dialyxir at the same time) and that's somehow creating the issue?

This doesn't seem action-related, though, as we just download pre-packaged Elixir and serve it.

Tagging @ericmj, in any case, since he might have other thoughts on this.

If you could share more details (e.g. your .yml file) maybe we could dig deeper.

SamuelWillis commented 1 year ago

Great question! I am not sure if it's a caching thing but maybe it is!?

For a little more context I bumped to elixir 1.15.4 on a personal project and have began to see the same failures in my GitHub Actions.

In the following Action I have seen the CI job run successfully and the CD job fail. Then when I make the fix for the CD job and re-run the Action the CI job will fail due to no dialyzer.

Which seems odd because the job passes on the first run and then fails on the second run. I'm honestly a little stumped!

Here's my .yml.

name: CI/CD

on:
  push:
    branches:
      - main
  pull_request:
    branches:
      - main
env:
  MIX_ENV: test

jobs:
  ci:
    name: CI
    runs-on: ubuntu-latest
    steps:
      - name: Checkout Code
        uses: actions/checkout@v3

      - name: Set up Elixir
        uses: erlef/setup-beam@v1
        with:
          version-file: ".tool-versions"
          version-type: "strict"

      # Step: Define how to cache deps. Restores existing cache if present.
      - name: Cache deps
        id: cache-deps
        uses: actions/cache@v3
        env:
          cache-name: cache-elixir-deps
        with:
          path: deps
          key: ${{ runner.os }}-mix-${{ env.cache-name }}-${{ hashFiles('**/mix.lock') }}
          restore-keys: |
            ${{ runner.os }}-mix-${{ env.cache-name }}-

      # Step: Define how to cache the `_build` directory. After the first run,
      # this speeds up tests runs a lot. This includes not re-compiling our
      # project's downloaded deps every run.
      - name: Cache compiled build
        id: cache-build
        uses: actions/cache@v3
        env:
          cache-name: cache-compiled-build
        with:
          path: _build
          key: ${{ runner.os }}-mix-${{ env.cache-name }}-${{ hashFiles('**/mix.lock') }}
          restore-keys: |
            ${{ runner.os }}-mix-${{ env.cache-name }}-
            ${{ runner.os }}-mix-

      # Step: Conditionally bust the cache when job is re-run.
      # Sometimes, we may have issues with incremental builds that are fixed by
      # doing a full recompile. In order to not waste dev time on such trivial
      # issues (while also reaping the time savings of incremental builds for
      # *most* day-to-day development), force a full recompile only on builds
      # that are retried.
      - name: Clean to rule out incremental build as a source of flakiness
        if: github.run_attempt != '1'
        run: |
          mix deps.clean --all
          mix clean
        shell: sh

      - name: Install deps, compile deps, and build PLTs
        run: mix do deps.get, deps.compile, dialyzer --plt

      - name: Run Check
        run: mix check
        env:
          SECRET_KEY_BASE: ${{ secrets.SECRET_KEY_BASE }}

  cd:
    name: Deploy app
    needs: [ci]
    if: github.ref == 'refs/heads/main' && github.event_name == 'push'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: superfly/flyctl-actions/setup-flyctl@master
      - run: flyctl deploy --remote-only
        env:
          FLY_API_TOKEN: ${{ secrets.FLY_API_TOKEN }}

paulo-ferraz-oliveira commented 1 year ago

ci and cd aren't sharing cache, but:

your deps and _build cache are named the same (your key and restore-keys arguments should probably not overlap)
you're not using the Elixir and/or Erlang/OTP version as part of your cache identifiers, which means you'll probably run into problems when running for multiple Elixir and/or Erlang/OTP versions

SamuelWillis commented 1 year ago

Hmm, interesting. The other workflow that has been running for a while without these issues is somewhat similar. It's only been since updating to 1.15.4 that we've begun to see these failures.

      - uses: actions/checkout@v3
      - name: Cache Elixir dependencies
        uses: actions/cache@v3
        with:
          path: |
            deps
            _build
          key:
            ${{ runner.os }}-mix-${{ secrets.CACHE_KEY }}-${{
            hashFiles('.tool-versions') }}-${{ hashFiles(format('{0}{1}',
            github.workspace, '/mix.lock')) }}
          restore-keys: |
            ${{ runner.os }}-mix-${{ secrets.CACHE_KEY }}-${{ hashFiles('.tool-versions') }}

I think I'm stumbling over why this has begun happening with the bump to 1.15.4 when we haven't seen these failures for the last while.

I'll take a look at adjusting the keys so they do not overlap and see if that resolves things

paulo-ferraz-oliveira commented 1 year ago

In any case, it doesn't seem action-specific: we only install the base system Erlang/OTP + Elixir, and in this case it seems your issue relates to dialyxir or something similar.

Also this is what caught my attention previously

Probably this is because you installed Erlang
with your OS package manager and the Dialyzer package is separate.

In the case of Erlang/OTP + setup-beam that's not true, because Dialyzer IS shipped with the base system we install. (I don't even know what "the Dialyzer package" is - and I'm not sure you can install Erlang without it, either)

SamuelWillis commented 1 year ago

Yeah, 100%. It's thrown me off a little bit as well... I may move this to the Elixir forums and see if anyone is seeing anything similar or has some other insights!

paulo-ferraz-oliveira commented 1 year ago

👍 feel free to return here for more, or close if you feel there's no more you want to explore.

SamuelWillis commented 1 year ago

To follow up this appears to be an issue with dialyxir for Elixir. There's a PR open to resolve it.

Thanks for the suggestions here! I appreciate it.

erlef / setup-beam