Pelican-Elegant / elegant

Best theme for Pelican Static Blog Generator
https://elegant.oncrashreboot.com/
MIT License
293 stars 188 forks source link

Introduce more semantic HTML5 tags #237

Open silverhook opened 5 years ago

silverhook commented 5 years ago

Semantic HTML makes websites a lot more useful, even if it’s not apparent on the first sight.

If no-one beats me to it, I’m very willing to take this one.

More info:

AWegnerGitHub commented 5 years ago

What are you thinking of adding for the microdata? There are a lot of microdata types and subtypes and sub-subtypes. Each of those items take differing sets of properties. Some are required, some are optional, some are required if you want to use Google's Rich Cards, but are marked as optional, some have specific requirements about what you have to show if you use that type and want Google to display it as well (example: if you want to show star reviews, you have to show the rating some place).

Basically, this is not a small thing. It took me a long time just to get some of my posts to show review stars: review stars

Google is very picky about microdata. It also isn't always consistent. I have several articles that I have to go in and manually tell Google certain metadata points because it doesn't detect them - but on another article it will. I haven't found a pattern, and I'd find it really annoying if I was a high volume poster. :)

A Bad Detection Bad Detection


It is a great idea to do this. I'd love to have more microdata on my site and eventually get to real rich cards. But, I wanted to set expectations that this isn't a quick thing. I think it's going to take multiple iterations and is going to involve adding optional elements to posts (ie. for a rich card you have to have an image). It might be a good idea to start by figuring out which of the items/subitems/sub-subitems we want to support and what that will entail for our template(s).

For my reviews, I naively just created a new template type that had the microdata for a review vs. the microdata for a BlogPost or an Article. Adding a template for each type isn't the most sustainable way to handle this in the long run, but it is an option.

silverhook commented 5 years ago

I very much agree. If we tackle this, it might make sense to set it up as a longer-term side-project, perhaps even with a separate branch.

As for the scope, I think we can tackle it the following way:

  1. first introduce all obligatory metadata (I think we’re nearly, if not completely, there already)
  2. then add optional metadata that is already collected by default, but isn’t presented (optimally) yet
  3. and finally decide which additional metadata it would make sense to introduce as well

@AWegnerGitHub, I see you’re at least as interested in this as I am and seem to be more up-to-date than I. If we tackle this together, we can probably push it forward at a comfortable pace.

I think this would be a great addition to the theme, as it only adds features without removing or breaking what is already there.

iranzo commented 5 years ago

@AWegnerGitHub will you create a PR for this? it would be great to have it implemented in Pelican-Elegant!

silverhook commented 5 years ago

This could also be of help: https://github.com/drivet/pelican-indieweb-kit

AWegnerGitHub commented 5 years ago

I have a little bit of free time! I have come back to this and started working on it. However, I have a few things to raise for discussion.

Microdata vs. JSON-LD

In my examples above, I have used microdata to markup my post. The biggest issue I had with this is that it's all in-line. That makes adding data within a post really difficult. As I've investigated this a bit, I found that Google prefers JSON-LD format. Essentially, this adds a <script> tag (or several if you have multiple things to mark up) to your file. I think we should use JSON-LD instead of the microdata due to it's support and since we don't have to worry about inline data tagging this way.

An example that I got to validate looks like this:

{
  "@context": "http://schema.org",
  "@type": "Article",
  "headline": "Articles Count With Every Tag and Category",
  "url": "/articles-count-with-every-tag-and-category",
  "description": "Readers of an article on your site usually look for other articles on the same topic. Categories and tags are a way of showing them related articles. Elegant displays the count of articles that you have written in a category or tag in a non-intrusive manner. Every category and tag …",
  "image": "/images/avatars/talha131.png",
  "mainEntityOfPage": "True",
  "author": {
    "@type": "Person",
    "name": "Talha Mansoor",
    "url": "/images/avatars/talha131.png"
  },
  "publisher":{
    "@type":"Organization",
    "name":"Elegant",
    "url":"",
    "logo":{
        "@type":"ImageObject",
        "url":"https://www.google.com/images/branding/googlelogo/1x/googlelogo_color_272x92dp.png",
        "width":"357",
        "height":"60"
    }
  },
  "datePublished": "2019-06-30 13:26:18+05:00",
  "dateModified": "2019-07-31 15:45:51+05:00"
}
</script>

This is built using a template that looks like this:

{
  "@context": "http://schema.org",
  "@type": "Article",
  "headline": "{{ article.title|striptags }}",
  "url": "{{ SITEURL }}/{{ article.url }}",
  "description": "{{ article.summary|striptags }}",
  "image": "{{ SITEURL }}/images/avatars/talha131.png",
  "mainEntityOfPage": "True",
  "author": {
    "@type": "Person",
    "name": "{{ article.author.name }}",
    "url": "{{ SITEURL }}/images/avatars/talha131.png"
  },
  "publisher":{
    "@type":"Organization",
    "name":"{{ SITENAME|striptags|e }}",
    "url":"{{ SITEURL }}",
    "logo":{
        "@type":"ImageObject",
        "url":"https://www.google.com/images/branding/googlelogo/1x/googlelogo_color_272x92dp.png",
        "width":"357",
        "height":"60"
    }
  },
  "datePublished": "{{ article.date }}",
  "dateModified": "{{ article.modified }}"
}
</script>

Additional Fields

The problem with getting structured data to validate with Google (and I'm assuming we want Google to validate it versus being "technically correct, but not validated"), is that they require some additional pieces of information.

A couple things from the example above:

All of these need to be specified somehow, and I'm thinking they should go in the article metadata.

Templates

I am planning on creating these JSON-LD templates for articles and product reviews, because those are my use cases. However, if someone wants additional ones let me know and I'll get those in the initial release too.

The problem with this, is that we'll need an additional fields in the write up. An example from my existing blog looks like this:

Date: 2016-12-09 22:54
Tags: review, technical, learning
Category: Review
Slug: data-analysis-with-pandas-review
Summary: My review of "Data Analysis with Pandas" on Udemy.  
Status: published
Series: Course Reviews
template: review
revieweditem: Data Analysis with Pandas
score: 9.5

The last three items - template, revieweditem and score - are useful for different things. In an item review, you have to explicitly provide the name of what you are reviewing (so I couldn't just use the article title. I also wanted to provide a rating (out of 10), so that I could properly display stars. The template variable is used so that I can determine what to load. It uses the built in Pelican functionality to do this. Then, each template will point to the correct JSON-LD data.


TL;DR:

iranzo commented 5 years ago

My answers

Json is ok and seems easier About logos, default to pelican-elegant one unless user has specified site one

About URL default to site URL unless there's author metadata like the used for articlen undersign

I do usually write posts and sometimes gallery or presentation, listing what might be required would be ok

Thanks!

Sent from mobile

El jue., 8 ago. 2019 0:52, A Wegner notifications@github.com escribió:

I have a little bit of free time! I have come back to this and started working on it. However, I have a few things to raise for discussion. Microdata vs. JSON-LD

In my examples above, I have used microdata to markup my post. The biggest issue I had with this is that it's all in-line. That makes adding data within a post really difficult. As I've investigated this a bit, I found that Google prefers JSON-LD https://developers.google.com/search/docs/guides/intro-structured-data#markup-formats-and-placement format. Essentially, this adds a

This is built using a template that looks like this:

{

"@context": "http://schema.org",

"@type": "Article",

"headline": "{{ article.title|striptags }}",

"url": "{{ SITEURL }}/{{ article.url }}",

"description": "{{ article.summary|striptags }}",

"image": "{{ SITEURL }}/images/avatars/talha131.png",

"mainEntityOfPage": "True",

"author": {

"@type": "Person",

"name": "{{ article.author.name }}",

"url": "{{ SITEURL }}/images/avatars/talha131.png"

},

"publisher":{

"@type":"Organization",

"name":"{{ SITENAME|striptags|e }}",

"url":"{{ SITEURL }}",

"logo":{

    "@type":"ImageObject",

    "url":"https://www.google.com/images/branding/googlelogo/1x/googlelogo_color_272x92dp.png",

    "width":"357",

    "height":"60"

}

},

"datePublished": "{{ article.date }}",

"dateModified": "{{ article.modified }}"

}

Additional Fields

The problem with getting structured data to validate with Google (and I'm assuming we want Google to validate it versus being "technically correct, but not validated"), is that they require some additional pieces of information.

A couple things from the example above:

  • publisher - This should probably be a setting that users can change. We can default it to the site name though for sanity. Part of a publisher, though, is the requirement of having a logo. Do we have a good default image URL we can use for that?
  • author - I think I can get everything for the author section to match the blurbs that were implemented last year. In the example above I just have it pointing at the avatar image, but the link to the author's selected URL is probably more appropriate.
  • A review requires an image.

All of these need to be specified somehow, and I'm thinking they should go in the article metadata. Templates

I am planning on creating these JSON-LD templates for articles and product reviews, because those are my use cases. However, if someone wants additional ones let me know and I'll get those in the initial release too.

The problem with this, is that we'll need an additional fields in the write up. An example from my existing blog looks like this:

Date: 2016-12-09 22:54

Tags: review, technical, learning

Category: Review

Slug: data-analysis-with-pandas-review

Summary: My review of "Data Analysis with Pandas" on Udemy.

Status: published

Series: Course Reviews

template: review

revieweditem: Data Analysis with Pandas

score: 9.5

The last three items - template, revieweditem and score - are useful for different things. In an item review, you have to explicitly provide the name of what you are reviewing (so I couldn't just use the article title. I also wanted to provide a rating (out of 10), so that I could properly display stars. The template variable is used so that I can determine what to load. It uses the built in Pelican functionality http://docs.getpelican.com/en/3.6.3/faq.html#how-do-i-assign-custom-templates-on-a-per-page-basis to do this. Then, each template will point to the correct JSON-LD data.

TL;DR:

  • What types of posts are you writing? I am planning on making structured data templates for an Article (current post type) and a Review (rate a product).
  • Certain structured data requires different fields. What is the best way to indicate that? Is simply having a documentation page enough with a "We support these types. Each of these requires these fields"?
  • There are some global defaults that are required to properly validate structured data. These include a site logo, a publisher (do we want this global or per post). Some types require additional fields, such as an image.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Pelican-Elegant/elegant/issues/237?email_source=notifications&email_token=AACMJD4F5MNMBTLJPGLBCILQDNGY7A5CNFSM4GLTOGP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3Z5QEI#issuecomment-519297041, or mute the thread https://github.com/notifications/unsubscribe-auth/AACMJD56HZ2BGXSYMWW426LQDNGY7ANCNFSM4GLTOGPQ .

AWegnerGitHub commented 5 years ago

Is the logo included in the existing static files anywhere? I'd prefer to point to a local instance of the logo (that way it exists on the same domain as the site) versus pointing at an image on GitHub. I don't know how that impacts SEO, but personally, that's what I'd rather have on my site.

AWegnerGitHub commented 5 years ago

Update:

I've gotten a skeleton built for both a standard post and a review. Both are validating too:

Review JSON-LD This is how Google sees a product review.

BlogPosting

This is how Google sees a standard article. I currently have it classified as a BlogPosting. This is how I've had my microdata classifying posts for about 18 months now. I based that on this webmasters post when trying to determine Article vs. BlogPosting.

I still have plenty to do

In short, I still have a way to go.

talha131 commented 5 years ago

@AWegnerGitHub great work and thank you for taking up the mantel.

A couple of things

  1. We should keep microdata options. There are people who are against adding it to their site. An example.
  2. Judging from your comments, microdata seems very similar to the open graph. You may want to see this plugin. Perhaps microdata too can be outsourced to a plugin.

Properly define publisher (probably a config dict). Right now, it's pulling the site name. I'd like this to be configurable

Why can't you let the user define in pelicanconf and then pull it from there into your template?

Determine how to include featured images.

Does this help?

https://github.com/Pelican-Elegant/elegant/blob/master/templates/_includes/smo_metadata.html#L16-L17

AWegnerGitHub commented 5 years ago

Microdata is inline in the HTML code. JSON-LD is separate from the HTML, like the plugin you linked. Inlining HTML tags is a pain - and really not something we can do if we want to support a variety of structured data types. JSON-LD will end up being like the smo_metadata template you linked. I was already thinking of adding a metadata flag to enable/disable it. Your article seems to confirm that is a wanted feature.

The smo_metadata template you linked is looking for a featured_image metadata element. Do we have documentation anyway of all of the standard and non-standard article metadata elements we support? That'd be helpful in building this and as a place to document what structured data will require as well.

talha131 commented 5 years ago

Do we have documentation anyway of all of the standard and non-standard article metadata elements we support?

https://elegant.oncrashreboot.com/metadata

Inlining HTML tags is a pain - and really not something we can do if we want to support a variety of structured data types.

No doubt. For that, we would need a plugin and set some custom markdown syntax.

JSON-LD is the easiest and straight forward solution.