ShopRunner / baleen

Kotlin DSL for validating data (JSON, XML, CSV, Avro)
BSD 3-Clause "New" or "Revised" License
16 stars 5 forks source link

Maven Central

Baleen

Baleen is fluent Kotlin DSL for validating data (JSON, XML, CSV, Avro)

Features

Example Baleen Data Description

import com.shoprunner.baleen.Baleen.describeAs
import com.shoprunner.baleen.ValidationError
import com.shoprunner.baleen.dataTrace
import com.shoprunner.baleen.types.StringType

val departments = listOf("Mens", "Womens", "Boys", "Girls", "Kids", "Baby & Toddler")

val productDescription = "Product".describeAs {

    "sku".type(StringType(min = 1, max = 500),
          required = true)

    "brand_manufacturer".type(StringType(min = 1, max = 500),
          required = true)

    "department".type(StringType(min = 0, max = 100))
         .describeAs {
             test("department is correct value") { data ->
                 assertThat(data).hasAttribute("department") {
                     it.isOneOf(departments)
                 }
             }
         }
}

// Get your data
val data: Data = // get from file or database or whatever 

// Get Validation Results
val validation: Validation = dataDesc.validate(data)

// Each call on `isValid` and `results` will iterate over dataset again. 
// Warning: that for large datasets this will eat memory
val cachedValidation: CachedValidation = validation.cache()

// Check if any errors. True if no errors, false otherwise. 
// val isValid: Boolean = validation.isValid()
val isValid: Boolean = cachedValidation.isValid() 

// Iterate over results. Each iteration over results will execute entire flow again unless cached.
// validation.results.forEach { }
cachedValidation.results.forEach { }

// Use `watch` to print to Console summaries every 1,000 results
cachedValidation.results.watch().forEach { }

// Summarize into Validation object with list of ValidationSummary with examples of errors included    
// val validationSummary: Validation= validation.createSummary()
val validationSummary: CachedValidation = cachedValidation.createSummary()
validationSummary.results.forEach { }

// Print the results to various formats including Console, Logger, CSV, HTML, or Text
// Look at baleen-printer-* sub-modules
File("validation.html").writer().use {
    HtmlPrinter(it).print(validationSummary.results)
}

Getting Help

Join the slack channel

Core Concepts

Warnings

Sometimes you will want an attribute or type to warn instead of error. The asWarnings() method will transform the output from ValidationError to ValidationWarning for all nested tests run underneath that attribute/type.

import com.shoprunner.baleen.Baleen.describeAs
import com.shoprunner.baleen.ValidationError
import com.shoprunner.baleen.dataTrace
import com.shoprunner.baleen.types.StringType
import com.shoprunner.baleen.types.asWarnings

val productDescription = "Product".describeAs {

    // The asWarnings() method is on StringType. Min/max are warnings, but required is still an error.
    "sku".type(StringType(min = 1, max = 500).asWarnings(), required = true) 

    // The asWarnings() method is on the attribute. Min/max and required are all warnings.
    "brand_manufacturer".type(StringType(min = 1, max = 500), required = true).asWarnings()

    // The asWarnings() method is on the attribute. The attribute's custom test will also be turned into a warning.
    "department".type(StringType(min = 0, max = 100)).describeAs {
        test("department is correct value") { data ->
            assertThat(data).hasAttribute("department") {
                it.isOneOf(departments)
            }
        }
    }.asWarnings()
}

Tagging

A feature of Baleen is to add tags to tests, so that you can more easily identify, annotate, and filter your results. There are a couple use-cases tagging becomes useful. For example, you have an identifier, like a sku, that you want each test to have so that you can group together failed tests by that identifier. Another use-case is that you have different priority levels for your tests that you can set so you can highlight the most important errors.

val productDescription = "Product".describeAs {

    // The tag() method is on StringType and dynamic tag pulls the value.
    "sku".type(StringType().tag("priority", "critical").tag("sku", withValue()))

    // The tag() method is on the attribute and the dynamic tag pulls an attribute value from sku.
    "brand_manufacturer".type(StringType(), required = true)
        .tag("priority", "low")
        .tag("sku", withAttributeValue("sku"))

    // The tag() method is on the attribute, and a custom tag function is used that returns a String
    "department".type(StringType(min = 0, max = 100))
        .tag("priority", "high")
        .tag("sku", withAttributeValue("sku"))
        .tag("gender") { d ->
            when {
                d is Data && d.containsKey("gender") -> 
                    when(d["gender"]) {
                        "male" -> "male"
                        "mens" -> "male"
                        "female" -> "female"
                        "womens" -> "femle"
                        else -> "other"
                    }
                else -> "none"
            }
        }
}
// Tag is on data description and the dynamic tag pulls attribute value from sku field  from the data
.tag("sku", withAttributeValue("sku"))

Tagging is also done at the data evaluation level. When writing tests, additional tags can be passed in using the Tagger function.

    "department".type(StringType(min = 0, max = 100)).describeAs {
        test("department is correct value", "sku" to withAttributeValue("sku")) { data ->
            assertThat(data).hasAttribute("department") {
                it.isOneOf(departments)
            }
        }
    }

Some Baleen Validation libraries, such as the XML or JSON validators, use tags to add line and column numbers as it parses the original raw data. This will help identify errors in the raw data much more quickly.

Gotchas

Similar Projects