facebook / buck2

Build system, successor to Buck
https://buck2.build/
Apache License 2.0
3.33k stars 194 forks source link

Use transitive set for HaskellLibraryInfo #617

Closed avdv closed 1 month ago

avdv commented 1 month ago

This PR changes the fields of HaskellLibraryProvider to contain a transitive set of HaskellLibraryInfo instead of using a list.

facebook-github-bot commented 1 month ago

@ndmitchell has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

ndmitchell commented 1 month ago

Thanks for the changes! This looks great. I'm heading off on holiday until Tuesday, so have sent it off for internal testing, but will probably land it once I'm back.

facebook-github-bot commented 1 month ago

This pull request has been reverted by 10096f7bbcb7694fa08d0881eae976c35824c984.

ndmitchell commented 1 month ago

@avdv afraid I had to revert this as it breaks in some circumstances. Just debugging which circumstances.

ndmitchell commented 1 month ago

The issue is when you have a Haskell library with deps = [":a", ":a"] - so the same dependency twice. That then generates an entry for the Haskell package database which is malformed and GHC complains. I've now added such an instance to our internal test suite (which we're working on open sourcing slowly...) so just trying to reapply.

ndmitchell commented 4 weeks ago

79d74b500f35be4819a636e4d7901c4c29eee9de was the magic to make this work, so relanded as 340e7df97c2027e6e2376365662de23a4c35c239

ndmitchell commented 4 weeks ago

Bad news - there was a significant performance regression from this code. It generates command lines that upset GHC. I don't yet know why, but investigating. Reverted once more.

ndmitchell commented 4 weeks ago

There are two places this code causes a significant performance regression. In both cases it comes from the fact that multiple packages might exist within the same package_db. Before, since we deduplicated everything by value, you might have 100 packages and 20 package_dbs. If you give GHC 100 package dbs (even if there are only 20 unique ones) the performance crashes dramatically. That's especially true if you have a package_db representing all of stackage, which both occurs a lot and is slow to load. I've spun up #624 as a version with your two patches and my deduplication code in. Aiming to land that next week, but being cautious given this has gone wrong twice now 😄

avdv commented 3 weeks ago

There are two places this code causes a significant performance regression. In both cases it comes from the fact that multiple packages might exist within the same package_db. Before, since we deduplicated everything by value, you might have 100 packages and 20 package_dbs. If you give GHC 100 package dbs (even if there are only 20 unique ones) the performance crashes dramatically.

Oh, that's interesting. Thank you for pushing this forward!

ndmitchell commented 1 week ago

To update, there ended up being a third place, which I fixed too. All sorted now and seems to have had no regressions for a while. Thanks for the contribution!