Matching of dict key with value question

gcanti / tcomb

Type checking and DDD for JavaScript

MIT License

1.89k stars 120 forks source link

Matching of dict key with value question #274

Open ArmorDarks opened 7 years ago

ArmorDarks commented 7 years ago

Hello

Im sure this question caused by lack of my understanding, but I'm kinda stuck and don't see the way to handle this scenario. I will be grateful for any help.

Here is data example:

'2222-2222':
  id: '2222-2222'
  someProp: 'value'

'3333-3333':
  id: '3333-3333'
  someProp: 'value'

I'm trying to validate with tcomb does dict name matches internal property of struct named id, since both of them are representation of ids. In other words, is value 2222-2222 is same as `id: '2222-2222'.

Here what I have so far:

const schema = t.dict(t.String, t.struct({
  id: t.String,
  someProp: t.maybe(t.String)
}, { name: 'Dataset', strict: true }), 'Datasets');

By current schema I can be sure that dict key name will be string, same as id, but I also want to ensure that their values are same.

I've tried to solve this with t.declare and t.define, t.refinement and t.irreducible, but without any success, because in most cases I can't receive reference to whole current entry to make validation.

Maybe t.refinement and t.irreducible should accept second argument, which will contain whole current context?

Something like

const schema = t.dict(t.String, t.struct({
  id: t.refinement(t.String, ((b, allData) => /* some funct to match current property against whole data */), 'false'),
  someProp: t.maybe(t.String)
}, { name: 'Dataset', strict: true }), 'Datasets');

Though, it's crude idea, I'm not sure is it possible to get current context.

So far it boils down to the point, that it's easier to loop over dataset itself and check for each key and see does it match slug or no in standalone, non-related to tcomb function. But it doesn't seem like a beautiful solution.

Or maybe what I'm trying to do is beyond structure and types validation responsibility? To be honest, I'm quite new to this whole topic.

gcanti commented 7 years ago

that it's easier to loop over dataset itself and check for each key and see does it match slug or no

Seems a refinement of a dictionary

import t from 'tcomb'

const sameIds = obj => {
  for (let k in obj) {
    if (k !== obj[k].id) return false
  }
  return true
}

// note: you can use t.interface instead of t.struct
const Schema = t.refinement(t.dict(t.String, t.interface({
  id: t.String,
  someProp: t.maybe(t.String)
}, { name: 'Dataset', strict: true })), sameIds, 'Datasets')

ArmorDarks commented 7 years ago

Thanks for reply!

Aaah, so I was on a right track with t.refinement, but it didn't work in my case (I tested it with more complex data and schema). Turned out, that if dict's struct had it's own validation errors, it resulted in errors and sameIds was even never called, that's why I wasn't able to figure out why it didn't print any errors.

That was very unobvious, I had feeling that validation made with t.validate of tcomb-validation should validate everything and call all refinements even if passed to refinement schema had errors, and only then shut down. Shut it be like that?

Anyway, at least it works, and declaration seems to be logical.

But, unfortunately, another issue arose. Error message becomes very obscure, and it's completely impossible to understand based on it what exactly causes error:

powershell_2017-02-25_13-06-03

I think this happens, because we are checking it in wrong direction. We're saying that dict should have structure with keys, which are same as ids (and thus we're checking whole dict), but in fact we want to tell that ids should be same as struct keys. In other words, it looks like it should be refinement of id property itself, that will also give us a clear error message telling that exactly id property is wrong, not whole dict. But it's impossible from id refinement to access context, as I've already mentioned.

gcanti commented 7 years ago

That was very unobvious, I had feeling that validation made with t.validate of tcomb-validation should validate everything and call all refinements even if passed to refinement schema had errors, and only then shut down

The predicate of a refinement should be called only if the value passed in is an instance of the base type, otherwise you could get runtime errors inside the predicate body.

Showing some types:

function refinement<T>(baseType: Type<T>, predicate: (value: T) => boolean, name?: string): Refinement<T>

it's completely impossible to understand based on it what exactly causes error:

Yep, refinements work on the entire object

Another option is to create a more tractable data structure, for example

const Dataset = t.interface({
  id: t.String,
  someProp: t.maybe(t.String)
}, { name: 'Dataset', strict: true })

const foo = {
  '2222-2222': {
    id: '2222-222',
    someProp: 'value1'
  },
  '3333-3333': {
    id: '3333-3333',
    someProp: 'value2'
  }
}

const toArray = obj => {
  const a = []
  for (let k in obj) {
    a.push([k, obj[k]])
  }
  return a
}

const predicate = ([k, v]) => k === v.id

const Schema = t.list(t.refinement(t.tuple([t.String, Dataset]), predicate))

Schema(toArray(foo))

Uncaught TypeError: [tcomb] Invalid value [
  "2222-2222",
  {
    "id": "2222-222",
    "someProp": "value1"
  }
] supplied to Array<{[String, Dataset] | predicate}>/0: {[String, Dataset] | predicate}

And finally another option is to define your own combinator, instead of using t.dict, where you can express your dependency between keys and values (seems a lot of work though)

ArmorDarks commented 7 years ago

The predicate of a refinement should be called only if the value passed in is an instance of the base type, otherwise you could get runtime errors inside the predicate body.

I see. But why not to print those errors in predicate body then? Or at least not to try run it, and make try \ catch?

Another option is to create a more tractable data structure, for example

Oh my... This code reads like a whole massive hack :) But I'm really not sure that it should be done in production. To me it looks like a potential source of lots of bugs and confusions about real dataset schema.

Besides, even that will not work with large data. For example, in our case we usually dealing with objects with over 200 properties. You can imagine how it's hard to read through errors which contains 200 props each.

As I've already mentioned, it still sounds to me like wrong direction, since we do not want to ensure that whole interface correct. We want to ensure that id property is correct and equal to dict keys.

And finally another option is to define your own combinator, instead of using t.dict, where you can express your dependency between keys and values (seems a lot of work though)

Yeap.

Well, it isn't super critical issue for us, and you already provided solutions which will, probably, satisfy some developers. So, if you think that it is fine, you can close this issue. Otherwise, you can leave it as a reminder about some missing functionality.

Unfortunately, I'm not that useful in solving this puzzle right know, because whole thing about runtime validation is quite new to me, not to mention those tricky recursion cases. Providing a current context of refinement as a second argument is the only thing I can think about right know, since it should allow to check recursions. The other way is to pass references somehow, like in JSON Schema...

Anyway, we will continue to use tcomb, since I'm quite happy how elegantly it works. Maybe with time I will figure out better solution.

Thank you for you time and amazing support 👍

gcanti commented 7 years ago

Oh my... This code reads like a whole massive hack :) But I'm really not sure that it should be done in production

I agree, it's an hack. Though tcomb's asserts are not meant to be run in production, they are a development tool. Actually they are stripped out when NODE_ENV='production'. I use tcomb when I trust the source (i.e. all errors are bugs). If you don't trust the source, you should use tcomb-validation and if an error is returned, apply some recovery strategy

In short

trusted source => assert => throws => bug
untrusted source => validate => doesn't throw => recovery

ArmorDarks commented 7 years ago

Though tcomb's asserts are not meant to be run in production,

Oh, sorry. I used wrong word. I meant to push this code into repository, so that other developers could see and use it.

Reshape input into another form just to validate already sounds to me like a form of vandalism. It effectively breaks any point in validation, because, in fact, we have to ensure that object was correctly reshaped into array and nothing broke by writing another test. Not to mention that this introduces complex, confusing and undesired logic into system.

Can't even imagine what other developer will think if he will stumble upon this validation. It supposed to give him understanding about enforced shape of the structure, not confuse him.

I agree that reshaping might be applicable to some cases, but I think it should be done only if we intend to use input in this reshaped form further. In such case we definitely should validate against reshaped object. But not in this case.

I use tcomb when I trust the source (i.e. all errors are bugs). If you don't trust the source, you should use tcomb-validation and if an error is returned, apply some recovery strategy

I have nothing against it. We're using tcomb-validation to validate our datasets whenever needed. I've posted this issue to tcomb because it seems to be more related to assertions and refinements rather than validation itself.