Closed mdekstrand closed 2 months ago
What yaml options do you use?
I'm using {schema: "json"}
.
The default schema that extractYaml()
uses extends the json
schema. So, I don't yet understand how such issues are arising. Can you please provide a minimally reproducible code snippet and instructions that worked before these changes but didn't after the upgrades?
The extensions are exactly the problem — the default schema detects and parses ISO dates as JavaScript Date
, whereas the the JSON schema leaves them as strings for the application to deal with later. The following code will show that the parsed value has type object
and has filled in the time information with midnight UTC, while I need to be able to handle it as a string (and that passing schema: "json"
to yaml.parse
does so):
import { extractYaml } from "@std/front-matter";
import * as yaml from "@std/yaml";
const meta = "date: 2024-10-24";
const text = `---
${meta}
---
text`;
let parsed = extractYaml(text);
console.log("front-matter: parsed %s: %o", typeof parsed.attrs.date, parsed.attrs.date);
let yparse = yaml.parse(meta);
console.log("yaml defaults: parsed %s: %o", typeof yparse.date, yparse.date);
yparse = yaml.parse(meta, { schema: "json" });
console.log("yaml json schema: parsed %s: %o", typeof yparse.date, yparse.date);
produces the output:
front-matter: parsed object: 2024-10-24T00:00:00.000Z
yaml defaults: parsed object: 2024-10-24T00:00:00.000Z
yaml json schema: parsed string: "2024-10-24"
Ah, I see. Yep, let's add add an option for configurability. PRs are open to add ParseOptions
from @std/yaml
to extractYaml()
.
Instead of adding an option, how about we split extract
and parse
into two separate public functions? We always can provide a function that does both as we have now but it would make the whole mod more flexible for custom use cases.
Accepting parser
as the 2nd argument might be another option:
export function extract<T>(
text: string,
parse_: Parser = parse as Parser,
): Extract<T> {
return extractAndParse(text, EXTRACT_YAML_REGEXP, parse_);
}
You can specify your own parser (including 3rd party one)
import { parse } from "@std/yaml";
extract(markdown, (text) => parse(text, { schema: "json" }));
@timreichen @kt3k that's exactly what the old version of front-matter
did, and I'd be happy for that solution as well. The current version of front-matter
has things more entangled, though, in ways I haven't taken time to fully understand, so it may no longer be practical (specifically, the underlying extract functions require a regex in addition to the parser).
@iuioiua I submitted #5748 to add this.
Accepting
parser
as the 2nd argument might be another option:export function extract<T>( text: string, parse_: Parser = parse as Parser, ): Extract<T> { return extractAndParse(text, EXTRACT_YAML_REGEXP, parse_); }
You can specify your own parser (including 3rd party one)
import { parse } from "@std/yaml"; extract(markdown, (text) => parse(text, { schema: "json" }));
I would strongly advocate for splitting the functionality instead of passing data through. It disentangles the two processes and allows for an additional use case when one wants to extract the frontMatter data only, for example to forward it to another api.
const string = "...";
const { frontMatter } = extract(string);
api.handleFrontMatter(frontMatter);
custom parsing also would look straight forward
const string = "...";
const { frontMatter } = extract(string);
const attrs = customParse(frontMatter);
Pre-1.0,
front-matter
exposed acreateExtractor
function to create extractors with custom parsers. I used this to use a YAML parser with a different schema. Sincefront-matter
1.0 only exposes the extractor functions and they have no configuration knobs, it is impossible to change the parser options (or use a different parser) with the public API.Given that the
extractAndParse
function requires a format-specific regex, it looks like the best solution for my particular use case (configuring the yaml parser) would be to add anoptions: ParserOptions
toextractYaml
. Re-exposing generic front-matter extraction that returns plain text for the client to parse would also be useful for other extensions, but looks more difficult from my understanding of the current code and is not necessary for my immediate problem.The alternatives I have considered are sticking with the last 0.2XX release of
front-matter
or importingextractAndParse
directly from_shared.ts
, but the latter looks impractical. I have tried seeing if I can get away with the standard schema, but due to legacy content the standard Yaml schema parses dates in incorrect time zones in my data.