Difference in build success across branches?

doctor-phil commented 6 months ago

It seems like very frequently, test/preview builds will succeed on alternative branches, but fail when they are merged into main.

Furthermore, when drafting a release/publish from main, more build issues will appear even if the preview build succeeds after the latest change on main

This causes a 3+ step debug routine for any changes that is not ideal. @mmcky do you have any idea what might be causing this disconnect?

mmcky commented 6 months ago

@doctor-phil thanks for the heads up. We don't experience those issues. I will take a look at the workflows and see what might be causing the issues.

Some things to check:

[x] Is the same environment being used for all build workflows?
[x] Is a build cache implemented for merges to main to speed up PRs

mmcky commented 6 months ago

@doctor-phil it looks like many of the preview build failures are due to a new version of pandas or matplotlib with code internal to the lecture requiring updating

'Legend' object has no attribute 'legendHandles'

doctor-phil commented 6 months ago

@mmcky yeah that's why I made the environment so restrictive. Without pinning the specific versions, every update causes some of the lectures to break. In this specific case, the preview builds on PR #242 succeeded on the branch, but failed the preview build when merged to main because of a typo in the changes.

Then once I got the preview build working on main I drafted a publish/release and now there are a bunch of errors in the lecture code that weren't caught in either of the previous preview builds

mmcky commented 6 months ago

@mmcky yeah that's why I made the environment so restrictive. Without pinning the specific versions, every update causes some of the lectures to break. In this specific case, the preview builds on PR #242 succeeded on the branch, but failed the preview build when merged to main because of a typo in the changes.

Then once I got the preview build working on main I drafted a publish/release and now there are a bunch of errors in the lecture code that weren't caught in either of the previous preview builds

@doctor-phil one way this can happen is the CI uses a build cache to increase performance so only the lecture(s) that are changing get executed. When it runs on the main branch there is a fresh cache run done so if any software has changed it can cause failures if the existing lectures have issues re: versions. Have you updated any software versions?

mmcky commented 6 months ago

Also re: https://github.com/QuantEcon/lecture-datascience.myst/pull/242 it doesn't look like this actually was built on the PR as it originated from a fork. Unfortunately when a PR originates from a fork GitHub requires you to authorise the run (and sometimes that even isn't available). did you see a netlify preview link?

doctor-phil commented 6 months ago

@mmcky I haven't updated any software versions (intentionally). I should have time the next couple days to check out the current build errors since that might give some more insights.

Also, I didn't realize that it didn't do the preview build for PR from a fork. I just saw the green check and assumed.

Still though, seems odd that once it was merged to main I got the preview link successfully but failed the release build

doctor-phil commented 5 months ago

@mmcky For context, this most recent build failure was due to a deprecation of cm.get_cmap and legendHandles from matplotlib. I pinned matplotlib <= 3.8.4 as a potential fix.

Having such a strict environment is definitely not ideal, but without it there's this "house of cards" effect where a deprecation in matplotlib causes 3 lecture builds to fail, etc. This was happening before especially with pandas which is why I set up these pins in the first place.

Re: the cache, it does seem like only the altered lectures are being built in the preview builds (at least from what I can tell) and the errors from deprecations, etc. are not showing up from these. But it's weird because the preview builds still take ~30 minutes to complete, which would be the runtime if all of the lectures were being built. (Most of the build time comes from working_with_text.md which I think is webscraping, and classification.md, which is training a NN)

Edit to ref: #650

doctor-phil commented 2 weeks ago

Closing this because I think the issue is fixed with the new environment updates

QuantEcon / lecture-datascience.myst

Difference in build success across branches? #243