Support plaintext contentType

HoverBaum commented 2 years ago

I am trying to use .tsx files as input for a documentType. My use case is to turn them into JSON objects representing the associated components' properties via react-docgen-typescript.

The plan would be to have a computedField that uses react-docgen-typescript to parse the raw file content into a JSON object representing the properties.

But, Contentlayer doesn't support .tsxfiles, yet.

I tried this with the following config, assuming contentType: 'data' would just load the content of the files but was taught on Discord that it is only for frontmatter.

export const ComponentProp = defineDocumentType(() => ({
  name: 'ComponentProp',
  filePathPattern: `lib-src/**/*.tsx`,
  contentType: 'data',
}));

Proposed solution

Add a contentType: 'plain' that just loads the entire file and leaves processing up to the user.

This contentType would solve my use-case but even better, enable users of Contentlayer to use it for their use-cases which we are not yet aware of or that are needed by only a small minority.

Implications

Default contentType Currently, Contentlayer treats files as markdown by default. Assuming we add a 'plain' type I would suggest changing this behavior to read files as plaintext by default. That, however, could be a breaking change!

I am not 100% sure how Contentlayer currently treats files but use-cases of people who currently parse non *.md files as markdown would be broken.

Plain text defaults Going with the above Contentlayer would start to treat all file extensions it doesn't know as plaintext. This implies that Contentlayer assumes to be used on text files only. It would lead to errors when handling binary file types, such as jpg or mp3.

There should be documentation and logging around this fact. The question here is whether Contentlayer could confirm that a file it is processing is a text file or not?

Alternative implementation

A backward-compatible way of introducing plaintext file types would be to add a contentType: 'plain' as an additional, optional feature to use.

However, I could see people getting confused by having a plaintext interpretation available but this not being the treatment that Contentlayer default to.

Ont he other hand, Contentlayer is a system for handling content and might find it's audience largely in people processing markdown files which would justify leaving the default interpretation as markdown.

Open questions

[ ] Is defaulting to plain text a good idea?
[ ] Can we determine a file to be a text file?
[ ] Should we potentially switch to requiring conentType to be set explicitly?

schickling commented 2 years ago

Thanks for suggesting this feature @HoverBaum - I'm excited for the use cases this feature will potentially unlock.

I was briefly looking into what it would take to implement this feature and it would require quite a few internal changes as the concept of "fields" is a pretty fundamental abstraction. So as things stand right now (and given my limited capacity) I'll wait until there's more demand for this feature before prioritizing it.

Liam-Scott-Russell commented 8 months ago

I would be very interested in this feature. For context, I am using remark-code-import to import code examples into my MDX posts, but when I edit these code examples, they don't update until I restart the process, and I get this error:

File updated: blog/imported-code/example1.ts
Warning: Found 1 problems in 1 documents.

 └── Found unsupported file extensions for 1 documents. (Skipping documents)

     • "blog/imported-code/example1.ts" uses "ts"

Having a plaintext or a code component would allow me to have the hot-reload, and also allow embedding examples into their own pages (e.g. like the typescript playground).

I'm not sure how the "fields" concept applies, however perhaps a typescript contentType could use the typescript compiler to validate the contents of the file (instead of frontmatter validation)?

contentlayerdev / contentlayer