Closed xmatthewx closed 4 years ago
@xmatthewx We transferred ownership of StoryEngine to Loup design a few months ago. Are we planning to scape the data off of their website? I think we might want to ask them first, since they own the rights to that content now.
this plan is their plan. we discussed moving the mozilla storyengine interviews to pulse long ago. we just didn't want to prioritize it, knowing our work on fellow profiles would eventually enable it.
i'm sure they'd give us a sql dump or a wp xml export. i can contact them after we have a sense of whether that's useful for us.
It doesn't feel like a good use of DevOps time to manually format text data for an import into the CMS. There are plenty of free online tools that can convert HTML to Markdown for us. If we can provide program staff with links to these tools it won't take much of their time to format the exported content themselves on a case by case basis.
Cool. I'll try and take this. I feel like 2 hours from me could save them a dozen hours. And I'm not even sure who "them" is.
Chris do we have a SQL back up of the site when we did the hand off? Or should I reach out to Christine P?
I believe I did capture a backup. I will have to confirm that.
I can confirm there's an encrypted snapshot of the storyengine wordpress site saved in the mofo-archives S3 bucket.
Great. If you can give me a copy of the SQL, I'll see if I can prep the content.
You can strip or skip sensitive info in these tables. Or I can do it:
I only need post content. Terms (tags, cats) could be useful.
More on WP DB tables
edit: it's so weird to be reading WP documentation. i spent a lot of time here like 9 years ago.
This issue seems to have died? reopen if we need to complete anything here.
MoFo has a pile of interviews in StoryEngine that we will put into Pulse profiles before we let go off that site.
I don't think we should attempt an import. But, to expedite the process, can we extract all the content from WP into individual text files and convert html to markdown? We can aim for 80% quality, ignoring edge cases.
A quick export to prep content will enable program staff to simply copy and paste the content, and not fuss with inline style.
Thoughts? @cadecairos @gideonthomas @alanmoo @jessevondoom