Closed k8hughes closed 4 years ago
Comms review - this is ongoing and impacted by capacity - please leave open so we keep going on it!
@SimonMurphyDI on this, if you want a content scrape to show you where the gaps in SEO meta data are let me know. I can run one fairly quickly.
Thanks @benjamincoleman
Is it possible to do this in such a way that it checks the 'related links' have been added manually (rather than automatically)?
Best
Simon
Not as things stand...
But, we could add an attribute to indicate if the related links are manual or automatic to detect that on a scrape though. I'd then set the scraping tool to check for that CSS class and capture the content if it exists - then we have a spreadsheet with gaps in to show where the automatic related items are.
B
Thanks Ben, that's good to know. I think in total we'd hope to have a spreadsheet that captures:
How will the scrape capture child/chapter pages of longer reports?
It's worth flagging that we're changing 'Publications' to 'Resources' and so the scrape would be better if it happened after that.
Can you check an estimate cost with @k8hughes before you go ahead?
Thanks! Simon
Hi Simon,
How will the scrape capture child/chapter pages of longer reports?
We'll curate a list and sort by URL before or after the scrape - that will put them into order.
The child pages can then take the publication date
from the parent.
It's worth flagging that we're changing 'Publications' to 'Resources' and so the scrape would be better if it happened after that.
Hmmm, sounds a bit of a step down in importance - resources are 'useful things' to me, rather than the main body of work from DI. Why the change?
The only thing we'd need to do to get the scrape done is add an attribute to the related links as I mentioned previously. Dev time for that should be very low.
@k8hughes I'd estimate 1-2 hours tops, depending on how much formatting the scrape data could use to make it more usable.
Hi @benjamincoleman I think this could all be handled as a temporary management command in Wagtail, considering most of the fields Simon is interested in are page fields. Running a quick experiment now.
Probably right, with a scrape though, you get all this too:
Here's what a management command can pull. It may be interesting to add your additional SEO features, however! metadata.zip
Not sure if this is worth pushing as a branch, but here's the crux of it:
results = []
p_index = PublicationIndexPage.objects.first()
live_pages = Page.objects.descendant_of(p_index).live()
for live_page in live_pages:
specific_page = live_page.specific
row = [
specific_page._meta.verbose_name,
specific_page.url,
specific_page.title,
str(specific_page.published_date.date()) if hasattr(specific_page, "published_date") else str(specific_page.first_published_at.date()),
specific_page.hero_image is not None,
specific_page.seo_title,
specific_page.search_description,
len(specific_page.publication_related_links.all()) > 0
]
results.append(row)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Check the promote tab of the imported content.