Effect-TS / effect

An ecosystem of tools to build robust applications in TypeScript
https://effect.website
MIT License
7.21k stars 230 forks source link

`@effect/platform`s `Multipart` and `UrlParams` modules do not support `string[]` but should #2873

Open datner opened 4 months ago

datner commented 4 months ago

What version of Effect is running?

latested

What steps can reproduce the bug?

Try to work with multi-value fields on a FormData

const fd = new FormData()
fd.append("key", "foo")
fd.append("key", "bar")
fd.append("key", "baz")

// later

HttpServer.request.schemaBodyForm(...) // whoops
HttpServer.request.schemaBodyUrlParam(...) // also whoops

What is the expected behavior?

export interface Persisted {
  readonly [key: string]: ReadonlyArray<PersistedFile> | string
}
export const schemaBodyUrlParams: <R, I extends Readonly<Record<string, string | string[]>>, A>(
  schema: Schema.Schema<A, I, R>,
  options?: ParseOptions | undefined
) => Effect.Effect<A, ParseResult.ParseError | Error.RequestError, R | ServerRequest> = internal.schemaBodyUrlParams

(and it's reprecussions)

What do you see instead?

The Persisted type of the Multipart module and the general UrlParams and such:

export interface Persisted {
  readonly [key: string]: ReadonlyArray<PersistedFile> | string
}
export const schemaBodyUrlParams: <R, I extends Readonly<Record<string, string>>, A>(
  schema: Schema.Schema<A, I, R>,
  options?: ParseOptions | undefined
) => Effect.Effect<A, ParseResult.ParseError | Error.RequestError, R | ServerRequest> = internal.schemaBodyUrlParams

should also support string[] As supported by the .append and subsequently the .getAll features of FormData

Additional information

beep boop

arekmaz commented 3 months ago

in my point of view these structures should only have fields parsed as string[], never as a single value,

as at the time of parsing you can never know if the user will want to parse the field as a string or an array with one element,

in my opinion parsing a field as just string (in case there's only one entry) would be equivalent to database drivers returning from e.g. select * from user:

[] if there is no users, User if there is exactly one user, and User[] if there are more users,

which looks like convenience but actually makes it harder to handle and a bit unpredictable

recently I made a small utility just like this and I concluded it only makes sense to parse fields as string[],

and a convenience schema if the user wants just to use a single (the first value):

const RequiredHead = <A, B, R>(s: S.Schema<A, B, R>) =>
  S.NonEmptyArray(S.Any).pipe(
    S.transform(s, {
      decode: (a) => a[0],
      encode: (s) => [s] as const,
      strict: true,
    }),
  );

the user then can always expect an array predictably, and doesn't need to handle unions of string | string[]

datner commented 3 months ago

@arekmaz this would be right if we were validating, but we're parsing. And more importantly, multipart forms are designed such that you can take a single one, or get all of a thing. The schema should be accepting of schemas that take string or schemas that take string[] or schemas that take string | string[]. Theres no point forcing the schema to only accept string[]-taking schemas in that constellation.

example:

const MyForm = S.Struct({
  thingId: S.String,
  subthingIds: S.propertySignature(S.Array(S.String)).pipe(S.fromKey("subthingId[]"))
})

Also, the RequiredHead is not needed

S.Array(mySchema).pipe(
   S.headOrElse(),
)

Would take the head of an array or fail the parse if it does not have a head. It optionally takes a thunk for a default value, and theres also the .head variant that would take the head and put it in an Option instead.

arekmaz commented 3 months ago

validation with your schema ‘MyForm’ will fail if there is only one element in the ‘subthingId’ which would be a perfectly valid one element array, unless you plan to include an array version alongside every field, thanks for the pointer on the head schema, but I think unless you plan to parse each field in two ways single and array eg ‘subthingIds[]’, the natural way for me would be array,

you could say each field represents one or more values, and should be typed as string | [string, string, ...string[]], which is to say "if I'm validating an array, I'm expecting at least two elements, because I will never get an one element array,

or you could just say "I'm expecting every field to be an array of one or more values" which is in my opinion a better and simpler mental model

datner commented 3 months ago

@arekmaz I have no idea what you are basing this on... intuition? rationalization? the spec for form-urlencoded doesn't say this the api for form-urlencoded doesn't say this the api for FormData or the spec for actual entry list, none of them define this behavior. If anything the usage of tuples and lists suggest that the canonical shape would always be string[].

You can check it out in the console if you wish, the difference is between getAll and get, the value within has no bearing CleanShot 2024-07-08 at 04 06 25 The basis for this issue is expectations and intuitions that are derived from the spec and common api. We must have very good reason if we want to introduce opinionation in this extremely well-defined corner, and I don't consider my personal feelings regarding this api to be a good reason

arekmaz commented 3 months ago

that's exactly what I mean, the canonical shape should be string[], every field is a list of one or more values

datner commented 3 months ago

@arekmaz

the canonical shape should be string[]

But we're not whatwg, we don't get to decide, and they decided it's a sum type string | string[].

every field is a list of one or more values

every entry is a list of 0 or more values. With getting a single entry of key resulting in null | string and getting all entires of key results in string[] (again with 0..n elements). We agree and understand this very well. I can't understand how my conclusion from "the two results are either string (or null if nothing is there) or a string[]" is "we should allow either a schema that takes a string (or null, if it allows it), or a schema that takes a string[]" and your conclusion is "we should only allow string[] schemas". Why are we adding opinions here? why are we changing the well-known shape of the data structure? I am not looking for a purpose for doing it, I am looking for a justification for doing it. This would be a very surprising and unwelcome behavior on part of /platform to do that.

side note: I am completely ignoring file lists in the discussion because there are not changes that should be done there

arekmaz commented 3 months ago

so how would you see parsing a field?

let's see:

single value

const sp1 = new URLSearchParams([['a', 'a'], ['a', 'b']])
const s1 = Schema.Struct({
    a: Schema.String
})

parseSearchParams(s1)(sp1) // fails

multi value:

const sp2 = new URLSearchParams([['a', 'a']])
const s2 = Schema.Struct({
    a: Schema.Array(Schema.String)
})

parseSearchParams(s2)(sp2) // fails

every field, doesn't matter what you expect would have to be parsed with something like:

const StringOrArray = Schema.Union(Schema.String, Schema.Array(Schema.String))

const sp3 = new URLSearchParams([['a', 'a']])
const s3 = Schema.Struct({
    a: StringOrArray
})

const r = parseSearchParams(s2)(sp2) // succeeds with { a: string | string[] }

// if I want a single value:
const a = Array.isArray(r.a) ? r.a[0] : r.a

// if I wanted an array:
const b = Array.isArray(r.a) ? r.a : [r.a]

for me parsing everything as just an array is a way simpler mental model, in my opionion we don't have to follow the standards everywhere

datner commented 3 months ago

I don't understand, this is not an issue. We agree that this is not an issue. I showed exactly how this behaves.

single value

const sp1 = new URLSearchParams([['a', 'a'], ['a', 'b']])
const s1 = Schema.Struct({
    a: Schema.String
})

parseSearchParams(s1)(sp1) // succeeds with { a: "a" }

multi value

const sp2 = new URLSearchParams([['a', 'a']])
const s2 = Schema.Struct({
    a: Schema.Array(Schema.String)
})

parseSearchParams(s2)(sp2) // succeeds with { a: ["a"] }

it is actually your case that is problematic

const StringOrArray = Schema.Union(Schema.String, Schema.Array(Schema.String))

const sp3 = new URLSearchParams([['a', 'a']])
const s3 = Schema.Struct({
    a: StringOrArray
})

const r = parseSearchParams(s2)(sp2) // fails. How can we decide if to `get` or `getAll`???
here is a quick mapping of value, expectation, operation, and result. Value Schema Operation Result
["a"] String get "a"
[] String get ParseError.Missing
["a","b"] String get "a"
[] NullOr(String) get ParseError.Missing*
[] optional(String) get undefined
["a"] Array(String) getAll ["a"]
[] Array(String) getAll []
["a","b"] Array(String) getAll ["a","b"]
[] NullOr(Array(String)) getAll []
[] optional(Array(String)) getAll []
* Union(Array(String), String) N/A ParseError.Forbidden

note that for NullOr(String) I think that it's better to not allow null despite the api retuning null in that case because it makes everything more complicated and the spec is non-specific about it. The alternative is the inverse, failing with ParseError.Types complaining about expecting string and seeing an actual null.

So, why not always string[]? We just use getAll on everything and the world is great and good! Just because whatwg said so? No. Because it doesn't make sense within the expected usage for the api. This represents a form. Given a form with 20 fields, how many of them do you reckon are multi-values? probably somewhere between 0 and 3. So what is the dx gained by making devs call .head or [0] 17 times over? This is not a debate or conjecture, I am the only user to raise this limitation. And even that only after a year of day-to-day working with effect. The multi-value FormData entry is just a rare use case.

arekmaz commented 3 months ago

ah, ok, I see, the issue is, I'm assuming that the search params are parsed before they are decoded,

as you saw the schema, has to take a Record<string, string | string[]>, which you have to first have before validation,

parsing the URLSearchParams has no information about this schema, you can't parse a field as both a string and and Array,

what you are trying to suggest is to take this schema:

export const schemaBodyUrlParams: <R, I extends Readonly<Record<string, string>>, A>(
  schema: Schema.Schema<A, I, R>,
  options?: ParseOptions | undefined
) => Effect.Effect<A, ParseResult.ParseError | Error.RequestError, R | ServerRequest> = internal.schemaBodyUrlParams

somehow extract the "input" schema, and use that to decode the fields to achieve this intermediary object input form, before decoding it to the "output" A. If that's possible maybe that would work, even be more convenient, but also it would mean parsing search params for every "decoding".

The thing is, this input object form, is supplied via a service HttpServerRequest.ParsedSearchParams, the search params are currently "parsed" only once and before there's a notion of any schema, you cannot use the schema information to go from URLSearchParams -> I, at least for search params, you can only go from I -> A via decoding.

What I'm proposing is that the "input" I intermediary object be parsed as Record<string, string[]> if the behavior stays the same (the search params are transformed to an object before any schema is supplied).

Maybe that was the misunderstanding.