gatsbyjs / gatsby

The best React-based framework with performance, scalability and security built in.
https://www.gatsbyjs.com
MIT License
55.28k stars 10.31k forks source link

gatsby-transformer-remark not properly transforming markdown strings. #9056

Closed brendanmc6 closed 6 years ago

brendanmc6 commented 6 years ago

Description

Passing in a markdown string, the resulting html is broken. It has escaped all newlines and various other elements, and added quotation marks-- only __strong_ and _italic are converted correctly. Below is what it looks like when I query for both raw and converted html:

// graphIQL result
{
  "Body":  {
    "raw":  "# Title\n\nThis is _mark down_.",
    "childMarkdownRemark":  {
      "html":  "<p>\"# Title\\n\\nThis is <em>mark down</em>.\"</p>"
    }
  }
}

I am using Airtable to serve a string (the "long text" field type). I use gatsby-source-airtable to bring the string in, map it as "text/markdown", and gatsby-transformer-remark is indeed successful in identifying the nodes, as shown above. This is why I believe gatsby-transformer-remark is the source of the issue.

Is there some sort of configuration option that can be passed to change the way that markdown is parsed?

Environment

System: OS: Windows 10 CPU: x64 Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz Binaries: npm: 5.6.0 - C:\Program Files\nodejs\npm.CMD Browsers: Edge: 42.17134.1.0 npmPackages: gatsby: ^2.0.19 => 2.0.19 gatsby-plugin-manifest: ^2.0.5 => 2.0.5 gatsby-plugin-offline: ^2.0.5 => 2.0.5 gatsby-plugin-react-helmet: ^3.0.0 => 3.0.0 gatsby-plugin-styled-components: ^3.0.0 => 3.0.0 gatsby-source-airtable: ^2.0.1 => 2.0.1 gatsby-source-filesystem: ^2.0.3 => 2.0.3 gatsby-transformer-remark: ^2.1.7 => 2.1.7

stefanprobst commented 6 years ago

What do you get for childMarkdownRemark { rawMarkdownBody }?

brendanmc6 commented 6 years ago

@stefanprobst That is coming out as:

"\"# Title\\n\\nThis is _mark down_.\""

It might be worth noting that when i use unified.js with remark-html and input the raw string, the html comes out perfect (but then I can't benefit from the great transformer-remark plugins).

stefanprobst commented 6 years ago

I think this indicates an error with the source plugin (i.e. the airtable plugin), not the markdown transformer.

jbolda commented 6 years ago

Airtable is definitely inserting those newline escapes. How do other plugins that support markdown sources deal this? We may have to "purify" long text fields in gatsby-source-airtable if all of the other libraries deal with it internally as well.

DSchau commented 6 years ago

@jbolda it may be worth checking out gatsby-source-contentful! I'm not sure if the data we get from the API is already escaped, or we do anything special, but I know it's a plugin I've used that was able to handle childMarkdownRemark functionality flawlessly with Markdown/text content.

I don't think it'd be a terrible approach to just use a regular expression though, e.g.

const normalize = text => text.replace(/\\n/g, '\n')
brendanmc6 commented 6 years ago

Airtable is definitely inserting those newline escapes. How do other plugins that support markdown sources deal this? We may have to "purify" long text fields in gatsby-source-airtable if all of the other libraries deal with it internally as well.

@jbolda The escapes don't appear when I remove the "text/markdown" mapping and access the string directly:

"node": {
  "data": {
    "Body": "# Title\n\nThis is _mark down_.\n\nThis is a ![photo](./ToC.png)"
  }
}
brendanmc6 commented 6 years ago

@jbolda @stefanprobst Indeed I believe I've narrowed it down to airtable-source plugin, likely JSON stringify (not too familiar with how stringify works.). This is without gatsby-transformer-remark being installed. Sorry for not doing this sooner! Let's move the issue over there.

// this query still returns the problematic escaped markdown.
{
  allAirtable {
    edges {
      node {
        data {
          Body {
            internal {
              content
            }
          }
        }
      }
    }
  }
}