Improvements to SEO - Githubissues

k8hughes commented 4 years ago

Check the promote tab of the imported content.

Blogs
Publications
Events
Team

SimonMurphyDI commented 4 years ago

Comms review - this is ongoing and impacted by capacity - please leave open so we keep going on it!

benjamincoleman commented 4 years ago

@SimonMurphyDI on this, if you want a content scrape to show you where the gaps in SEO meta data are let me know. I can run one fairly quickly.

SimonMurphyDI commented 4 years ago

Thanks @benjamincoleman Is it possible to do this in such a way that it checks the 'related links' have been added manually (rather than automatically)?
Best Simon

benjamincoleman commented 4 years ago

Not as things stand...

But, we could add an attribute to indicate if the related links are manual or automatic to detect that on a scrape though. I'd then set the scraping tool to check for that CSS class and capture the content if it exists - then we have a spreadsheet with gaps in to show where the automatic related items are.

B

SimonMurphyDI commented 4 years ago

Thanks Ben, that's good to know. I think in total we'd hope to have a spreadsheet that captures:

URL
Publication page title
Publication date (not the upload date)
If there's a hero image (or just the red default graphic)
The page title in the promote tab
If there's a meta description
If there's manually selected related links

How will the scrape capture child/chapter pages of longer reports?

It's worth flagging that we're changing 'Publications' to 'Resources' and so the scrape would be better if it happened after that.

Can you check an estimate cost with @k8hughes before you go ahead?

Thanks! Simon

benjamincoleman commented 4 years ago

Hi Simon,

How will the scrape capture child/chapter pages of longer reports?

We'll curate a list and sort by URL before or after the scrape - that will put them into order.

The child pages can then take the publication date from the parent.

It's worth flagging that we're changing 'Publications' to 'Resources' and so the scrape would be better if it happened after that.

Hmmm, sounds a bit of a step down in importance - resources are 'useful things' to me, rather than the main body of work from DI. Why the change?

The only thing we'd need to do to get the scrape done is add an attribute to the related links as I mentioned previously. Dev time for that should be very low.

@k8hughes I'd estimate 1-2 hours tops, depending on how much formatting the scrape data could use to make it more usable.

akmiller01 commented 4 years ago

Hi @benjamincoleman I think this could all be handled as a temporary management command in Wagtail, considering most of the fields Simon is interested in are page fields. Running a quick experiment now.

benjamincoleman commented 4 years ago

Probably right, with a scrape though, you get all this too:

Title / Description length
Reading time
10 prevalent keywords
Sentiment analysis
Readability scoring on Dale-Chall, Flesch-Kincaid, Gunning-Fog and SMOG indexes
Image ALT tag check
Google Analytics data per page

akmiller01 commented 4 years ago

Here's what a management command can pull. It may be interesting to add your additional SEO features, however! metadata.zip

akmiller01 commented 4 years ago

Not sure if this is worth pushing as a branch, but here's the crux of it:

        results = []
        p_index = PublicationIndexPage.objects.first()
        live_pages = Page.objects.descendant_of(p_index).live()
        for live_page in live_pages:
            specific_page = live_page.specific
            row = [
                specific_page._meta.verbose_name,
                specific_page.url,
                specific_page.title,
                str(specific_page.published_date.date()) if hasattr(specific_page, "published_date") else str(specific_page.first_published_at.date()),
                specific_page.hero_image is not None,
                specific_page.seo_title,
                specific_page.search_description,
                len(specific_page.publication_related_links.all()) > 0
            ]
            results.append(row)

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

devinit / DIwebsite-redesign

Improvements to SEO #399