Open silverhook opened 5 years ago
What are you thinking of adding for the microdata? There are a lot of microdata types and subtypes and sub-subtypes. Each of those items take differing sets of properties. Some are required, some are optional, some are required if you want to use Google's Rich Cards, but are marked as optional, some have specific requirements about what you have to show if you use that type and want Google to display it as well (example: if you want to show star reviews, you have to show the rating some place).
Basically, this is not a small thing. It took me a long time just to get some of my posts to show review stars:
Google is very picky about microdata. It also isn't always consistent. I have several articles that I have to go in and manually tell Google certain metadata points because it doesn't detect them - but on another article it will. I haven't found a pattern, and I'd find it really annoying if I was a high volume poster. :)
A Bad Detection
It is a great idea to do this. I'd love to have more microdata on my site and eventually get to real rich cards. But, I wanted to set expectations that this isn't a quick thing. I think it's going to take multiple iterations and is going to involve adding optional elements to posts (ie. for a rich card you have to have an image). It might be a good idea to start by figuring out which of the items/subitems/sub-subitems we want to support and what that will entail for our template(s).
For my reviews, I naively just created a new template type that had the microdata for a review vs. the microdata for a BlogPost or an Article. Adding a template for each type isn't the most sustainable way to handle this in the long run, but it is an option.
I very much agree. If we tackle this, it might make sense to set it up as a longer-term side-project, perhaps even with a separate branch.
As for the scope, I think we can tackle it the following way:
@AWegnerGitHub, I see you’re at least as interested in this as I am and seem to be more up-to-date than I. If we tackle this together, we can probably push it forward at a comfortable pace.
I think this would be a great addition to the theme, as it only adds features without removing or breaking what is already there.
@AWegnerGitHub will you create a PR for this? it would be great to have it implemented in Pelican-Elegant!
This could also be of help: https://github.com/drivet/pelican-indieweb-kit
I have a little bit of free time! I have come back to this and started working on it. However, I have a few things to raise for discussion.
In my examples above, I have used microdata to markup my post. The biggest issue I had with this is that it's all in-line. That makes adding data within a post really difficult. As I've investigated this a bit, I found that Google prefers JSON-LD format. Essentially, this adds a <script>
tag (or several if you have multiple things to mark up) to your file. I think we should use JSON-LD instead of the microdata due to it's support and since we don't have to worry about inline data tagging this way.
An example that I got to validate looks like this:
{
"@context": "http://schema.org",
"@type": "Article",
"headline": "Articles Count With Every Tag and Category",
"url": "/articles-count-with-every-tag-and-category",
"description": "Readers of an article on your site usually look for other articles on the same topic. Categories and tags are a way of showing them related articles. Elegant displays the count of articles that you have written in a category or tag in a non-intrusive manner. Every category and tag …",
"image": "/images/avatars/talha131.png",
"mainEntityOfPage": "True",
"author": {
"@type": "Person",
"name": "Talha Mansoor",
"url": "/images/avatars/talha131.png"
},
"publisher":{
"@type":"Organization",
"name":"Elegant",
"url":"",
"logo":{
"@type":"ImageObject",
"url":"https://www.google.com/images/branding/googlelogo/1x/googlelogo_color_272x92dp.png",
"width":"357",
"height":"60"
}
},
"datePublished": "2019-06-30 13:26:18+05:00",
"dateModified": "2019-07-31 15:45:51+05:00"
}
</script>
This is built using a template that looks like this:
{
"@context": "http://schema.org",
"@type": "Article",
"headline": "{{ article.title|striptags }}",
"url": "{{ SITEURL }}/{{ article.url }}",
"description": "{{ article.summary|striptags }}",
"image": "{{ SITEURL }}/images/avatars/talha131.png",
"mainEntityOfPage": "True",
"author": {
"@type": "Person",
"name": "{{ article.author.name }}",
"url": "{{ SITEURL }}/images/avatars/talha131.png"
},
"publisher":{
"@type":"Organization",
"name":"{{ SITENAME|striptags|e }}",
"url":"{{ SITEURL }}",
"logo":{
"@type":"ImageObject",
"url":"https://www.google.com/images/branding/googlelogo/1x/googlelogo_color_272x92dp.png",
"width":"357",
"height":"60"
}
},
"datePublished": "{{ article.date }}",
"dateModified": "{{ article.modified }}"
}
</script>
The problem with getting structured data to validate with Google (and I'm assuming we want Google to validate it versus being "technically correct, but not validated"), is that they require some additional pieces of information.
A couple things from the example above:
All of these need to be specified somehow, and I'm thinking they should go in the article metadata.
I am planning on creating these JSON-LD templates for articles and product reviews, because those are my use cases. However, if someone wants additional ones let me know and I'll get those in the initial release too.
The problem with this, is that we'll need an additional fields in the write up. An example from my existing blog looks like this:
Date: 2016-12-09 22:54
Tags: review, technical, learning
Category: Review
Slug: data-analysis-with-pandas-review
Summary: My review of "Data Analysis with Pandas" on Udemy.
Status: published
Series: Course Reviews
template: review
revieweditem: Data Analysis with Pandas
score: 9.5
The last three items - template
, revieweditem
and score
- are useful for different things. In an item review, you have to explicitly provide the name of what you are reviewing (so I couldn't just use the article title. I also wanted to provide a rating (out of 10), so that I could properly display stars. The template
variable is used so that I can determine what to load. It uses the built in Pelican functionality to do this. Then, each template will point to the correct JSON-LD data.
TL;DR:
My answers
Json is ok and seems easier About logos, default to pelican-elegant one unless user has specified site one
About URL default to site URL unless there's author metadata like the used for articlen undersign
I do usually write posts and sometimes gallery or presentation, listing what might be required would be ok
Thanks!
Sent from mobile
El jue., 8 ago. 2019 0:52, A Wegner notifications@github.com escribió:
I have a little bit of free time! I have come back to this and started working on it. However, I have a few things to raise for discussion. Microdata vs. JSON-LD
In my examples above, I have used microdata to markup my post. The biggest issue I had with this is that it's all in-line. That makes adding data within a post really difficult. As I've investigated this a bit, I found that Google prefers JSON-LD https://developers.google.com/search/docs/guides/intro-structured-data#markup-formats-and-placement format. Essentially, this adds a
This is built using a template that looks like this:
{
"@context": "http://schema.org",
"@type": "Article",
"headline": "{{ article.title|striptags }}",
"url": "{{ SITEURL }}/{{ article.url }}",
"description": "{{ article.summary|striptags }}",
"image": "{{ SITEURL }}/images/avatars/talha131.png",
"mainEntityOfPage": "True",
"author": {
"@type": "Person", "name": "{{ article.author.name }}", "url": "{{ SITEURL }}/images/avatars/talha131.png"
},
"publisher":{
"@type":"Organization", "name":"{{ SITENAME|striptags|e }}", "url":"{{ SITEURL }}", "logo":{ "@type":"ImageObject", "url":"https://www.google.com/images/branding/googlelogo/1x/googlelogo_color_272x92dp.png", "width":"357", "height":"60" }
},
"datePublished": "{{ article.date }}",
"dateModified": "{{ article.modified }}"
}
Additional Fields
The problem with getting structured data to validate with Google (and I'm assuming we want Google to validate it versus being "technically correct, but not validated"), is that they require some additional pieces of information.
A couple things from the example above:
- publisher - This should probably be a setting that users can change. We can default it to the site name though for sanity. Part of a publisher, though, is the requirement of having a logo. Do we have a good default image URL we can use for that?
- author - I think I can get everything for the author section to match the blurbs that were implemented last year. In the example above I just have it pointing at the avatar image, but the link to the author's selected URL is probably more appropriate.
- A review requires an image.
All of these need to be specified somehow, and I'm thinking they should go in the article metadata. Templates
I am planning on creating these JSON-LD templates for articles and product reviews, because those are my use cases. However, if someone wants additional ones let me know and I'll get those in the initial release too.
The problem with this, is that we'll need an additional fields in the write up. An example from my existing blog looks like this:
Date: 2016-12-09 22:54
Tags: review, technical, learning
Category: Review
Slug: data-analysis-with-pandas-review
Summary: My review of "Data Analysis with Pandas" on Udemy.
Status: published
Series: Course Reviews
template: review
revieweditem: Data Analysis with Pandas
score: 9.5
The last three items - template, revieweditem and score - are useful for different things. In an item review, you have to explicitly provide the name of what you are reviewing (so I couldn't just use the article title. I also wanted to provide a rating (out of 10), so that I could properly display stars. The template variable is used so that I can determine what to load. It uses the built in Pelican functionality http://docs.getpelican.com/en/3.6.3/faq.html#how-do-i-assign-custom-templates-on-a-per-page-basis to do this. Then, each template will point to the correct JSON-LD data.
TL;DR:
- What types of posts are you writing? I am planning on making structured data templates for an Article (current post type) and a Review (rate a product).
- Certain structured data requires different fields. What is the best way to indicate that? Is simply having a documentation page enough with a "We support these types. Each of these requires these fields"?
- There are some global defaults that are required to properly validate structured data. These include a site logo, a publisher (do we want this global or per post). Some types require additional fields, such as an image.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Pelican-Elegant/elegant/issues/237?email_source=notifications&email_token=AACMJD4F5MNMBTLJPGLBCILQDNGY7A5CNFSM4GLTOGP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3Z5QEI#issuecomment-519297041, or mute the thread https://github.com/notifications/unsubscribe-auth/AACMJD56HZ2BGXSYMWW426LQDNGY7ANCNFSM4GLTOGPQ .
Is the logo included in the existing static files anywhere? I'd prefer to point to a local instance of the logo (that way it exists on the same domain as the site) versus pointing at an image on GitHub. I don't know how that impacts SEO, but personally, that's what I'd rather have on my site.
Update:
I've gotten a skeleton built for both a standard post and a review. Both are validating too:
This is how Google sees a product review.
This is how Google sees a standard article. I currently have it classified as a BlogPosting
. This is how I've had my microdata classifying posts for about 18 months now. I based that on this webmasters post when trying to determine Article
vs. BlogPosting
.
I still have plenty to do
Person
as a valid publisher. It fails validation.Person
and there is a lot of data we can attach to this. Some of the important ones are probably sameAs
with links to provided social media accounts, alternativeName
(ie. Andy vs Andrew). It depends on how much a user wants to provide about themselves. This would end up being part of the Authors dictionary, I think.In short, I still have a way to go.
@AWegnerGitHub great work and thank you for taking up the mantel.
A couple of things
Properly define publisher (probably a config dict). Right now, it's pulling the site name. I'd like this to be configurable
Why can't you let the user define in pelicanconf
and then pull it from there into your template?
Determine how to include featured images.
Does this help?
https://github.com/Pelican-Elegant/elegant/blob/master/templates/_includes/smo_metadata.html#L16-L17
Microdata is inline in the HTML code. JSON-LD is separate from the HTML, like the plugin you linked. Inlining HTML tags is a pain - and really not something we can do if we want to support a variety of structured data types. JSON-LD will end up being like the smo_metadata template you linked. I was already thinking of adding a metadata flag to enable/disable it. Your article seems to confirm that is a wanted feature.
The smo_metadata template you linked is looking for a featured_image
metadata element. Do we have documentation anyway of all of the standard and non-standard article metadata elements we support? That'd be helpful in building this and as a place to document what structured data will require as well.
Do we have documentation anyway of all of the standard and non-standard article metadata elements we support?
https://elegant.oncrashreboot.com/metadata
Inlining HTML tags is a pain - and really not something we can do if we want to support a variety of structured data types.
No doubt. For that, we would need a plugin and set some custom markdown syntax.
JSON-LD is the easiest and straight forward solution.
Semantic HTML makes websites a lot more useful, even if it’s not apparent on the first sight.
If no-one beats me to it, I’m very willing to take this one.
More info: