[Feature Request] Separate run-time and test-time dependencies

Shopify / packwerk

Good things come in small packages.

MIT License

1.65k stars 113 forks source link

[Feature Request] Separate run-time and test-time dependencies #354

Open percysnoodle opened 1 year ago

percysnoodle commented 1 year ago

We have a monolithic codebase which we're trying to modularise into engines, using Packwerk to define and enforce the dependencies between them.

Moving our app code into the engines is usually pretty straightforward, but the tests are a problem. In a lot of places, the model code works generically without reference to the types of input it gets, but the tests define the behaviour using concrete examples of inputs using classes from outside the engine. We work with non-technical domain experts to write these tests, and they work almost exclusively with concrete examples, so rewriting them with dummy classes isn't a long-term solution.

What I'd really like is to be able to define one set of dependencies for the app code in each engine, and have Packwerk enforce those for the code under engines/whatever/app; and another looser set of dependencies for the spec code, and have Packwerk enforce those for the code under engines/whatever/spec. That way, we could move our test files across unaltered. Ideally, I'd do this in a single package.yml file:

enforce-dependencies: true

dependencies:
  - app
    - engines/base
    - engines/engine1
  - spec
    - engines/base
    - engines/engine1
    - engines/engine2

This could then be simplified using YAML anchors etc...

Hope that makes sense! Apologies if this is already possible and I've missed how to do it.

alexevanczuk commented 1 year ago

Hey @percysnoodle!

Can you share a bit more about what you are expecting packwerk to do differently with this distinction? As it stands now, this seems like a client-specific categorization of dependencies, but I don't see how packwerk would do anything differently (i.e. it would still create/detect dependency violations for unspecified dependencies, it would still validate, etc.).

If you just want to be able to categorize them, you could always just drop a comment in:

dependencies:
  # app
  - engines/base
  - engines/engine1
  # spec
  - engines/base
  - engines/engine1
  - engines/engine2

If you want to treat test and prod differently in terms of dependencies (although I'm not sure this is always recommended, as it often simplifies things to think of prod and its tests as having the same set of dependencies so they can make the same assumptions), you could always put your tests into a separate package, e.g. engines/whatever_test.

Sorry if this doesn't answer your question – getting more of a sense of what you're trying to achieve here I hope will help me be more helpful!

percysnoodle commented 1 year ago

If you want to treat test and prod differently in terms of dependencies [...] you could always put your tests into a separate package

That is something we've considered, and it does give us pretty much the behaviour we want, so I think you've understood the situation exactly with this suggestion. I guess what I'm hoping for is a way to achieve that without having to put everything in pairs of testless and test-only engines.

although I'm not sure this is always recommended, as it often simplifies things to think of prod and its tests as having the same set of dependencies so they can make the same assumptions

Yeah, I can see that this use-case wouldn't be recommended for people who are trying to build reusable modules that stand alone to be include in multiple projects. In prod, all our code will be run with all of the engines present; we're breaking it up into modules not so we can reuse parts but so that we can make it more explicit what is coupled to what, so engineers don't need to know the whole codebase before they're confident making a change, and so we can stop wasting time running unnecessary tests.

alexevanczuk commented 1 year ago

That makes sense @percysnoodle , we do the same with most of our gems/engines/packs (i.e. we don't distribute them).

I think I'm still trying to understand what behavior you want to change. Can you state how a specific command in packwerk would work differently than it does today if spec and prod dependencies were separated? To put it another way – what do you get from this that you don't get from just leaving comments indicating which dependencies are spec only?

percysnoodle commented 1 year ago

Suppose we have...

# engines/common/app/models/base_model.rb
class BaseModel
end

# engines/things/app/models/thing.rb
class Thing < BaseModel
  def description
    "a thing"
  end
end

# engines/describers/app/models/describer.rb
class Describer < BaseModel
  def describe(describable)
    describable.description
  end
end

# engines/describers/spec/models/describer_spec.rb
describe Describer do
  let(:thing) { Thing.new }
  subject { described_class.new.describe(thing) }
  it { is_expected.to eq("a thing") }
end

At present, if I want to use packwerk check, the describers engine has to include engines/things in its package.yml:

# engines/describers/package.yml
dependencies:
  - engines/common
  - engines/things

Thereby indicating that describers has a dependency on things. But describers has no run-time dependency on things, and adding this dependency would stop me from writing new code in things that depended on describers.

If instead I could write (for example):

# engines/describers/package.yml
dependencies:
  - app
    - engines/common
  - test 
    - engines/common
    - engines/things

then I could use packwerk check and have it pass, because the application code only uses the common engine at runtime but it uses the things engine at test-time. If I removed the engines/common line it would fail because BaseModel wasn't part of its run-time dependencies, and if I removed the engines/things line it would fail because things wasn't part of its test-time dependencies. If I added a reference to Thing in Describer, it would fail because things wasn't part if its test-time dependencies.

Apologies for the noddy example. By the way, I'd be just as happy with a different syntax; perhaps

# engines/describers/package.yml
dependencies:
  - engines/common

test_dependencies:
  - engines/things

might make more sense.

Alternatively, perhaps it would be possible to run packwerk twice with different configurations to achieve this outcome?

alexevanczuk commented 1 year ago

Apologies if I'm being dense, but I'm still not seeing what behavior is specifically different here. Are you mostly just asking for certain sets of dependencies to be permitted to have cycles? I'll try to get on the same page by responding to each point:

Thereby indicating that describers has a dependency on things. But describers has no run-time dependency on things, and adding this dependency would stop me from writing new code in things that depended on describers.

Is the main point here that you want test dependencies to permit cyclic dependencies, but you don't want the app? Is that what you mean by "would stop me from writing new code in things that depended on describers."

Besides that, I don't see why you couldn't just put engines/things in the main list of dependencies.

then I could use packwerk check and have it pass, because the application code only uses the common engine at runtime but it uses the things engine at test-time

It would also pass if you just put them all under dependencies -- are you saying the thing that wouldn't pass is bin/packwerk validate due to cyclic dependencies?

percysnoodle commented 1 year ago

Sorry, I shouldn't have mentioned the future code, that's confused things. I don't want to allow cyclic dependences - if you think about the case where we put the tests in test-only engines, nothing would depend on those engines, so their extra dependencies couldn't cause cycles.

I want to be able to specify the run-time dependencies of an engine, and the test-time dependencies of an engine, separately; and have them checked separately; so that I can use the tests written by our domain experts without having to add the models they use to the run-time dependencies.

Besides that, I don't see why you couldn't just put engines/things in the main list of dependencies.

Yes, in the noddy example, I could just do that. In our real app, that wouldn't work out so well. Apologies.

It would also pass if you just put them all under dependencies -- are you saying the thing that wouldn't pass is bin/packwerk validate due to cyclic dependencies?

Not necessarily cyclic; I want the checks on our app code to enforce the genuine dependencies of just that code, and I want to be able to write tests with looser (but still enforced) dependencies. I want to be able to use extra models in the tests without necessarily allowing those models to be used in the app code.

alexevanczuk commented 1 year ago

@percysnoodle Thanks for clarifying you're not trying to avoid cyclic dependencies. In that case, I'm still not sure what you want to accomplish that you can't accomplish by just adding comments in the package.yml that indicate which dependencies are for tests and which are for production.

It sounds like perhaps you want to be able to be able to do two separate things:

Have packwerk only scan package files and ensure that there are no dependencies in app code, and indicate new dependency violations with app code.
Do the same for above, but only for test code.

If this is the case, I'm not sure listing prod and test dependencies separately is needed here – you can simply run bin/packwerk check on the folders of code (e.g. app code) that you want to checked. Does listing them separate achieve something else? What would really help is if you can provide an example like the one above (perhaps as a separate repo or gist) with what your expected behavior is compared to the actual behavior (i.e. concretely what you expect the output to be, rather than a description of behavior).

Let me know if I'm still misunderstanding.

percysnoodle commented 1 year ago

I'm still not sure what you want to accomplish that you can't accomplish by just adding comments in the package.yml that indicate which dependencies are for tests and which are for production.

Packwerk won't do anything with those comments, will it? If I add a dependency under dependencies and comment it as for tests, but then I use a class from that dependency in myengine/app/models, packwerk won't flag that up.

It sounds like perhaps you want to be able to be able to do two separate things:

Have packwerk only scan package files and ensure that there are no dependencies in app code, and indicate new dependency violations with app code. Do the same for above, but only for test code.

If this is the case, I'm not sure listing prod and test dependencies separately is needed here – you can simply run bin/packwerk check on the folders of code (e.g. app code) that you want to checked.

We're currently invoking packwerk once in our CLI pipeline, as bin/packwerk check engines, and we have a package.yml file in each of the engine folders, which lists the engine folders. For what you're suggesting, I think we'd need to do something like bin/packwerk check engines/*/{app,config,lib}; bin/packwerk check engines/*/spec and then have individual package.yml files in each of app, config, lib and spec (plus any we add later). Plus I think we'd need to change those files to list the app and lib engine subfolders as separate dependencies. Is that correct?

Does listing them separate achieve something else?

If the above is correct, then at the very least it would save us a lot of configuration.

I'm again led to wonder whether the solution here is to support multiple configurations, so we could do something like bin/packwerk check engines --config-file=packwerk.yml; bin/packwerk check engines --config-file=packwerk-spec.yml and add something to packwerk-spec.yml so that we could use a different key than dependencies for the spec dependencies. Does that make sense?

What would really help is if you can provide an example like the one above (perhaps as a separate repo or gist) with what your expected behavior is compared to the actual behavior (i.e. concretely what you expect the output to be, rather than a description of behavior).

I'll see what I can come up with. It will still be somewhat descriptive, because I'd need to show different behaviour under different changes.

alexevanczuk commented 1 year ago

then have individual package.yml files in each of app, config, lib and spec (plus any we add later). Plus I think we'd need to change those files to list the app and lib engine subfolders as separate dependencies. Is that correct?

It depends. I think if you just check your prod code and spec code separately, you'll get separate output about violations in each location. You shouldn't need separate packages unless you want packwerk to consider test and prod dependencies as separate dependencies (i.e. listed separately in the dependencies list). In that case, it seems like what you really want is spec and prod code to be separate packages, which could be doable with separate top-level folders, or dropping package.yml in the separate folders as you suggested.

Hearing all this, it sounds like you basically want packwerk to have syntactic sugar for splitting up a package into two packages – one for prod code and one for spec code. I don't think allowing a separate input config would do this for us, because we really want packwerk to "automatically" split up packages into prod and test portions. As a stop gap, having separate packages (either with multiple package.yml files within a folder, or multiple top-level package.yml files) should do this for you without requiring changes to packwerk.

I'm not sure if there's a way to implement this sustainably without a pretty large refactor of the way packwerk parses packages, since today just looking for a package.yml file makes this simple. I'm not sure we'll be able to implement this unless we discover more uses cases for it and/or if we can think of a simple, low-hanging way to implement this.

percysnoodle commented 1 year ago

Hearing all this, it sounds like you basically want packwerk to have syntactic sugar for splitting up a package into two packages – one for prod code and one for spec code.

Yes, that's a really good way to put it.

As a stop gap, having separate packages (either with multiple package.yml files within a folder, or multiple top-level package.yml files) should do this for you without requiring changes to packwerk.

Great! How would I go about putting multiple package.yml files within a folder? Is there a way to configure packwerk to look for a different filename?

I'm not sure if there's a way to implement this sustainably without a pretty large refactor of the way packwerk parses packages, since today just looking for a package.yml file makes this simple. I'm not sure we'll be able to implement this unless we discover more uses cases for it and/or if we can think of a simple, low-hanging way to implement this.

Understood. Thanks for giving it consideration!

alexevanczuk commented 1 year ago

Great! How would I go about putting multiple package.yml files within a folder? Is there a way to configure packwerk to look for a different filename?

Packwerk will only look for a package.yml. You can put them anywhere though, simply with touch path/to/folder/package.yml, which should be enough for packwerk to consider something a package.

Understood. Thanks for giving it consideration!

For sure. I do like the idea of being able to think about test and prod dependencies differently, since it presents a lot of interesting opportunities to think about ensuring prod code never depends on test code, along with some other interesting possibilities. It's just a matter of making sure that we can support additional complexity and that the capabilities communicate clear opinions about how we expect folks to use the tools.

For what it's worth – at Gusto, even though we're also not reusing our packs/gems (i.e. they are unbuilt and only used within one application), we tend to believe that a pack of prod code and its tests should be the same set of dependencies (since we know tests know everything about the prod code, so have strictly more dependencies, and it's strange to think that the tests know about domains that the prod code does not). If you ever want to chat over slack (in the Ruby/Rails modularity slack server, www.tinyurl.com/rubyslack) or zoom to share approaches and strategies, I'd be happy to chat more.