The platform SDK starter is a way to configure your Flatfile Workbook in code.
Visit the Guides to learn more about using Flatfile and the Platform SDK Starter.
Visit CLI quickstart and follow instructions to sign up for a Flatfile account and run this code locally.
Here is an example of a fully functional Flatfile Workbook that we'll use in this example to demonstrate importing employee data.
const Employees = new Sheet(
'Employees',
{
firstName: TextField({
label: 'First Name',
required: true,
description: 'Given Name'
}),
lastName: TextField(),
fullName: TextField(),
stillEmployed: BooleanField(),
department: OptionField({
label: 'Department',
options: {
engineering: { label: 'Engineering' },
hr: 'People Ops',
sales: 'Revenue',
},
matchStrategy: 'exact',
}),
fromHttp: TextField({ label: 'Set by batchRecordCompute' }),
salary: NumberField({
label: 'Salary',
description: 'Annual Salary in USD',
required: true,
validate: (salary: number) => {
const minSalary = 30_000
if (salary < minSalary) {
return [
new Message(
`${salary} is less than minimum wage ${minSalary}`,
'warn',
'validate'
),
]
}
},
}),
},
{
allowCustomFields: true,
recordCompute: (record) => {
const fullName = `{record.get('firstName')} {record.get('lastName')}`
record.set('fullName', fullName)
return record
},
batchRecordsCompute: async (payload: FlatfileRecords<any>) => {
const response = await fetch('https://api.us.flatfile.io/health', {
method: 'GET',
headers: {
Accept: 'application/json',
},
})
const result = await response.json()
payload.records.map(async (record: FlatfileRecord) => {
record.set('fromHttp', result.info.postgres.status)
})
},
}
)
const BaseSheet = new Sheet(
'BaseSheet',
{
firstName: TextField(),
middleName: TextField('Middle'),
lastName: TextField(),
email: TextField({
unique: true,
primary: true
})
},
{
previewFieldKey: 'email',
}
)
const LinkedSheet = new Sheet(
'LinkedSheet',
{
email: LinkedField({
unique: true,
label: 'Email',
primary: true,
sheet: BaseSheet,
upsert: false,
}),
firstName: TextField(),
middleName: TextField('Middle'),
lastName: TextField(),
},
)
...
The above code will generate a Workbook that looks like this:
Follow these steps to deploy a Workbook and view it in your Dashboard. Because your secret keys are being used, Flatfile will know to create your first Workspace in the correct place.
npm install
to install all necessary packages.npm run deploy
Now, let's take a closer look at the example Workbook we just deployed, starting with the Fields. This Workbook uses all five built-in Flatfile fields to represent a Workbook used to receive employee data:
TextField
: a string of textNumberField
: a numerical valueDateField
: a dateOptionField
: a field with a set of pre-defined valuesBooleanField
: a true / false fieldLinkedField
: a field that links two sheets togetherLet's take a closer look at some of the options we've set on these fields:
firstName: TextField({
required: true,
description: 'Given name',
})
Here we've indicated that the firstName
field is required, and we've given it a human-readable description.
salary: NumberField({
label: 'Salary',
description: 'Annual Salary in USD',
required: true,
validate: (salary: number) => {
const minSalary = 30_000
if (salary < minSalary) {
return [
new Message(
`${salary} is less than minimum wage ${minSalary}`,
'warn',
'validate'
),
]
}
},
}),
Here we've indicated that the salary
field is required, and we've given it a human-readable description.
We also provide a validate
function that defines what we consider to be a valid value for this field. In this case, we've decided that salary
must be greater than or equal to $30,000. We also provide a human-readable message to be displayed when the validation criterion is not met.
department: OptionField({
label: 'Department',
options: {
engineering: { label: 'Engineering' },
hr: 'People Ops',
sales: 'Revenue',
},
matchStrategy: 'exact',
}),
Here we provide a pre-defined list of values that this field can have. We have also included the matchStrategy
flag, which determines whether Flatfile should only accept exact matches when automatically matching your OptionField options, or whether Flatfile should also use historical matches and fuzzy matches while automatically matching your options. If this is not set, your OptionField will use historical and fuzzy matches to automatically match options for this field.
If a default
value is provided for an OptionField
we check that is a value provided in options
.
If you want to have an option value of the empty string ''
, you must set that as a default like this.
emptyExample : OptionField({
options:{eng:"Eng_Display_Label4", hr:"HR", "":"empty"},
default: ''
})
email: LinkedField({
label: 'First Name',
sheet: BaseSheet,
upsert: false
}),
Here we define which field is linked to another template, along with the sheet this field should be linked to. We have also set upsert: false
which disables the default upsert behavior and will display an error on this sheet if a value imported in the LinkedField does not exist on the parent sheet. For more information about our Relational Data feature, visit our Relational Data Guide.
const BaseSheet = new Sheet(
'BaseSheet',
{
firstName: TextField({
unique: true,
primary: true,
}),
middleName: TextField('Middle'),
lastName: TextField(),
email: TextField(),
},
{
previewFieldKey: 'email',
}
)
Here we define the sheet we are linking to, and on the sheet set the previewFieldKey
option that will display on the original template. Note: LinkedField
can currently only be implemented in Workspaces, which is why this example doesn't have a portal deployed for the LinkedSheet and BaseSheet sheets.
Next, let's look at the options that we've set on the sheet itself:
const Employees = new Sheet(
'Employees',
...
{
allowCustomFields: true,
recordCompute: (record) => {
const fullName = `{record.get('firstName')} {record.get('lastName')}`
record.set('fullName', fullName)
return record
},
batchRecordsCompute: async (payload: FlatfileRecords<any>) => {
const response = await fetch('https://api.us.flatfile.io/health', {
method: 'GET',
headers: {
Accept: 'application/json',
},
})
const result = await response.json()
payload.records.map(async (record: FlatfileRecord) => {
record.set('fromHttp', result.info.postgres.status)
})
},
}
)
...
Here we define our Data Hooks®, functions which operate on user-inputted data and perform transformations on it to meet your system's needs. There are two types of Data Hooks® in this example: a recordCompute
hook and a batchRecordsCompute
hook.
recordCompute: (record) => {
const fullName = `{record.get('firstName')} {record.get('lastName')}`
record.set('fullName', fullName)
return record
}
recordCompute
is a Data Hook which runs synchronously for each record. In this example, we use the values of the firstName
and lastName
fields to generate and set the derived fullName
field.
batchRecordsCompute: async (payload: FlatfileRecords<any>) => {
const response = await fetch('https://api.us.flatfile.io/health', {
method: 'GET',
headers: {
Accept: 'application/json',
},
})
const result = await response.json()
payload.records.map(async (record: FlatfileRecord) => {
record.set('fromHttp', result.info.postgres.status)
})
}
batchRecordsCompute
is a Data Hook which runs after all recordCompute
calls have finished and receives the full list of processed rows, at which point you can perform bulk operations on them. This is a good place to put slower operations that benefit from batching, such as external HTTP requests. In this example, we fetch a single value from an API and write it to each record.
validate
should be used in most cases where you want to confirm that user-inputted data matches your specifications. For most fields, you probably want to use validate
. This function gets the proper type per field, and lets you add messages to the cell, including errors, warnings, and rejections, so the user can correct errors themselves. Validate can’t change data, it can only send messages. Compute must be used if you want to perform any type of transformation
recordCompute
and batchRecordsCompute
should only be used for cases where you must modify user-inputted data or generate new values not provided by the user but needed for your systems. For simple row work (that doesn't make HTTP calls) use recordCompute
. If you need to make an a call to an external API, reach for batchRecordsCompute
on sheet, as this allows you to request info about multiple values at once for increased performance.
We have written sensible default implementations of cast functions for TextField, NumberField, and DateField. We wrote extensive tests to document and verify their behavior. Refer to the CastFunction tests to see more.
When our default cast
function can't parse an incoming value in a reliable way, the cast function throws an error. The error message shows up in the UI and the original value is stored in the table so users can edit that value into a proper type.
field.compute
should be idempotent and converge to the same value after calling on the same input, calling the same compute function on the output from the last invocation should return the original input.
compute:(v:string) => {return v.toLocaleLowerCase()}
is a good function, since compute("ASDF") === compute('asdf') === 'asdf'
.
We are big believers in Test Driven Development at Flatfile. Well written tests help you reason about the behavior of complex systems.
We extensively used tests while developing this SDK, look here . We encourage you to use our testing system to speed up your development.
Running tests on a Sheet or Workbook is much faster than deploying to Flatfile and manually uploading data to verify behavior. Another added benefit is your tests will stay in this repo and help you make better decisions as you update your Sheets and Workbooks to reflect changing business requirements.
Stay tuned for future releases where we will add even more capabilities to our testing system.
The Flatfile Data Hook system has been designed to enable fine-grained functions to be used in combination to perform regular data validation and normalization tasks. Flatfile is building out a comprehensive standard library so that developers can plug in the proper functions without having to write them from scratch. This standard library is leveraged by HDDL to describe implementations concisely.
The Flatfile data pipeline orders data transformations so that functions at each point can be strictly typed with the most strictly prescribed functionality. This strict typing leads to more reliable functions that don't have surprise missing, undefined, or weird values.
field:cast
: functions here take a string or undefined and return either the primitive type specified by the field, null, or throw an error.field:default
: if cast
returned null (but didn't throw an error), a default value is filled in for the fieldfield:compute
: functions receive a single fully present value and return a value of the same type recordCompute
: functions receive a row with all required fields fully present and optional fields typed optional?:string
. Best used to compute derived values, but can also be used to update existing fields.batchRecordsCompute
: asynchronous (HTTP/api) calls are made to fill in values from external services. This takes records
so it is easier to make bulk calls.field:validate
: functions receive a fully present value and return annotations for the corresponding cellIf any of the above functions for a field throws an error, further processing is stopped for that field.
The most common custom written hooks that we expect to see are row compute and field validate.
We expect users to very rarely write their own cast functions; these are some of the easiest and most important to add to FFL.
recordCompute
is synchronous and only operates on one row at a time. In practice this isn't a big limitation because synchronous functions generally run extremely quickly in the node runtime.
batchRecordsCompute
runs after all recordCompute
have finished, and only operates on batches of records. We made this engineering decision to encourage bulk operations when making external HTTP calls which tend to be slow.
We are writing this SDK to enable skilled practitioners to quickly implement powerful transformations that take unstructured data from untrusted sources and shape that data into a clean, normalized format for input into many systems. This is the core of what Flatfile builds and we take it seriously. We also take our relationship with customers seriously, balancing putting tools in your hands quickly with supporting existing use cases. We are here to listen to your feedback and build tools for you.
This initial release of the SDK is purposely limited in scope to include only the pieces we are most sure about. We intend to rapidly release new functionality to the SDK and our platform as our understanding grows and we have time to put the best tools in front of you.
When releasing pieces to the SDK our thought process is guided by the following principles:
Links to more in-depth documentation we have written about features.
compute
. This function will be idempotent (running it over and over on the same input produces the same output and state)validate
function?
null
or undefined
in a validate function, because validate functions are never called with a value that isn't of the field's type. The way to check for null
is to set required:true
on the field, this works for 95% of field use cases, this will flag an error at the review stage for fields provided with a null
value. Having this strict typing makes validate
functions less error prone and less repetitious. If we allowed null
or undefined
to propagate to validate
, every user provided function would have to start with checking for null
or undefined
, users who didn't do this would either see a typing error, or worse suffer unreliable code that was deployed.required
behavior doesn't work is when a recordCompute
or batchRecordsCompute
function was expected to provide a value for a field, and failed. In that case, you are already writing a recordCompute
or batchRecordsCompute
function, check for the null
there.s3_url
that will match to the URL provided in the upload, and a row compute
that stores the size of the download to s3_url_size
, s3_url_size
should have an validate
of less than 5mb.s3_url
as an input and outputs s3_url_size
.SmartURLCapture
field that stores URLs to a synced s3 bucket and provides metadata about the download including size. The validate
will run on the size property of the SmartURLCapture
fieldcompute
.email
called Spam Score. On input of any email I want to update the spam score field with the results from an API.
compute
to fake a computed field like the previous three examples.recordCompute
or batchRecordsCompute
. We currently only allow mutating data and simultaneously adding message in those hooks.validate
.compute
to add Country code to phone numbers.A Sheet
object describes the desired characteristics or "shape" of data that you expect for an individual CSV file or sheet in an excel file. A Sheet can be thought of as roughly analogous to a database table.
A Field
object represents a column of data in a Sheet
. They are similar to columns in a database. Fields can be configured to clean, transform and validate incoming data through options and hooks.
Data Hooks® are the Flatfile copyrighted term for code that runs on Field
s and Sheet
s to transform and validate data.
Field hooks are Data Hook®s that run on individual fields.
A Workbook is a collection of Sheets. The Sheets of a Workbook can be optionally linked together via ForeignKey
. A Workbook is similar to a database schema.
Technically taken from SchemaIL interfaces
The base fields closely mirror primitive types and vary primarily by their default cast functions. Read the CastFunction tests for explicit understanding of cast function behavior.
Persists strings. Defaults to StringCast
cast
.
Persists numbers. Defaults to NumberCast
cast
Persists booleans. Defaults to BooleanCast
cast
Persists Dates. Defaults to DateCast
cast
Presents the user with discrete options. Accepts a specific option of
options:{'dbValue': 'Label displayed to user'}
options:{'dbValue': {label: 'Label displayed to user', futureOption1: undefined}
It is called like this
department: OptionField({
label: 'Department',
options: {
engineering: 'Engineering',
hr: 'People Ops',
sales: 'Revenue',
},
}),
or
department: OptionField({
label: 'Department',
options: {
engineering: {label:'Engineering'},
hr: {label:'People Ops'},
sales: {label:'Revenue'}
},
}),
interface FieldOptions {
label: string
primary: boolean
required: boolean
unique: boolean
cast: (value: Dirty<T>) => Nullable<T>
default: Nullable<T>
compute: (value: T) => T
validate: (value: T) => void | Message[]
}
(Field Options Definitions)[https://github.com/FlatFilers/platform-sdk-mono/blob/main/packages/configure/src/ddl/Field.ts#L43-L69] Every field can set at least the above properties
Controls the label displayed for the field in the UI
Is a value for this field required after the default
stage of field hooks. If set to true, an error will be thrown and registered as such on the cell for that value, no further processing will take place for that field.
Is this field the primary key for a sheet, or part of a composite primary key (not currently supported). Primary implies unique too.
Is this field required to be unique across the whole sheet. We have chosen to treat a field with multiple null
s as still unique. Tests and comments
cast
cast
transforms input into the type specified by the field.
default
The default value for this field
compute
compute
takes the type specified by the field and returns the type specified by the field.
validate
validate
takes the type specified by the field and returns validation messages. This is the most commonly used field hook. Validate can’t change data, it can only send messages. Compute must be used if you want to perform any type of transformation
Long form description that appears in the UI upon hover of the field name.
Annotations are automatically filled in messages that the platform sdk provides when default
or compute
changes a value. Following independent options set as an object - {}
If set to true
insertions of the default
value will be annotated with an info message of defaultMessage
The message to use when a default
value is inserted. If none specified, defaults to 'This field was automatically given a default value of'
If set to true
instances where compute
changes the value of a field will be annotated with an info message of computeMessage
The message to use when a compute
changes a value. if none specified, defaults to 'This value was automatically reformatted - original data:'
controls what parts of mapping/review/export a field occurs in
When set to false
this field will not appear for matching in the mapping stage.
This field will not appear in the review stage
This field will not be exported.
export interface SheetOptions<FC> {
allowCustomFields: boolean
recordCompute: (record:FlatfileRecord<any>, logger?:any): void
batchRecordsCompute: (records: FlatfileRecords<any>) => Promise<void>
}
Allows the end user to create additional fields from their upload when the incoming column does not match with any existing field for the Sheet.
Function that receives a row with all required fields fully present and optional fields typed optional?:string
. Best used to compute derived values, can also be used to update existing fields.
Asynchronous function that is best for HTTP/API calls. External calls can be made to fill in values from external services. This takes records
so it is easier to make bulk calls.
This helper utility gives you tools to test real records and fields locally against your field hooks (default
, cast
, compute
and validate
) and record-level compute functions (recordCompute
and batchRecordsCompute
). This runs the exact same logic that is done on Flatfile's production servers so it will produce the same results locally for quick testing before deploying.
For example, with this TestSheet:
const TestSheet = new Sheet(
'TestSheet',
{
firstName: TextField({
required: true,
description: 'foo',
compute: (v) => v.toUpperCase(),
}),
age: NumberField(),
testBoolean: BooleanField({ default: false }),
},
{
recordCompute: (record, _session, _logger) => {
const age = record.get('age')
const newAge = typeof age === 'number' ? age * 2 : 0
record.set('age', newAge)
},
}
)
This takes a field name and value and returns the transformed value based on all sheet operations:
expect(await testSheet.transformField('age', '10')).toEqual(20)
This takes a full record and returns the transformed record based on all sheet operations:
const inputRow = { firstName: 'foo', age: '10', testBoolean: 'true' }
const expectedOutputRow = { age: 20, firstName: 'FOO', testBoolean: true }
const res = await testSheet.testRecord(inputRow)
expect(res).toMatchObject(expectedOutputRow)
This takes an array of full records and returns the transformed records based on all sheet operations:
const inputRows = [
{ firstName: 'foo', age: '10', testBoolean: 'true' },
{ firstName: 'bar', age: '8', testBoolean: 'true' },
]
const expectedOutputRows = [
{ age: 20, firstName: 'FOO', testBoolean: true },
{ age: 16, firstName: 'BAR', testBoolean: true },
]
const results = await testSheet.testRecords(inputRows)
expect(results).toMatchObject(expectedOutputRows)