MilesMcBain / milesmcbain.com

Seeing if I can make my website look any good with distill...
9 stars 5 forks source link

Before I Sleep: How to be assertive about not testing your data science pipeline #21

Open utterances-bot opened 1 year ago

utterances-bot commented 1 year ago

Before I Sleep: How to be assertive about not testing your data science pipeline

https://milesmcbain.com/posts/assertive-programming-for-pipelines/

anthonynorth commented 1 year ago

Great post!

I've never considered using {testthat} for assertive programming. What I like most about this idea is that you're not using a new tool / package to write assertions. No new api to learn. Bonus: {testthat} is very well documented.

I'm a big fan of assertive programming, particularly in {targets} pipelines. I've found this to be a timesaver, not only in avoiding wasted compute, but also (and more importantly) in debugging.

I find it useful to validate inputs and sometimes outputs of targets. For input validation/assertions, you're making explicit what your assumptions of your inputs are. Re-running pipelines with new (external) data can result in explicit errors, or worse silently incorrect results, or a failure further down the pipeline that is difficult to diagnose.

Output assertions, at least how I've used them, are much more like a unittest; you're explicitly checking that your code does what you think it does and allows for specific failure messages.