Missing posts from wordpress export - Githubissues

beeminder / blog

3 stars 2 forks source link

Missing posts from wordpress export #361

Open dreeves opened 1 year ago

dreeves commented 1 year ago

### Desiderata
- [x] Get them back
- [ ] Diagnose how we missed them in the initial export
- [ ] Diagnose missing Disqus ID for /timecarrot post

The source doc URLs for these posts were missing in posts.json (called wp-posts.csv at the time):

/pareto
/copenhagen

I've now fixed that but am anxious to understand what happened in case it points to deeper problems.

Possibly relatedly, I'm seeing at least one other problem in wp-posts.csv: blog.bmndr.co/timecarrot has no disqus ID in the csv file but does have Disqus comments.

Cognata

318
http://beeminder.com/changelog#4615 (THIS UVI)

Verbata: wordpress export, mendoza, missing posts,

narthur commented 1 year ago

@dreeves I don't think I'm using the disqus IDs in the csv for legacy posts. I'm generating those on the fly.

dreeves commented 1 year ago

generating those on the fly

Ah, how are they generated? Don't we want to include them explicitly in the metadata (aka frontmatter) for consistency? (This may be a dumb question and I may be making a lot of assumptions I don't realize I'm making about how this works.)

[I had a PS here that I'm moving to a separate comment]

narthur commented 1 year ago

@dreeves I just checked my source spreadsheet and it looks like those are the only two posts missing a source url that we need to worry about. There are two others--a test post and one that looks like a spam post with this title:

How Do You Get A Ton Of Likes On Instagram Ig-Up.com

narthur commented 1 year ago

@dreeves Hmm, I'm not sure. I guess I could make an argument either way for legacy posts. But yes, seems reasonable to go ahead and make those explicit, too, maybe at the same time we get rid of the csvs and go to a single json file.

dreeves commented 1 year ago

To review/clarify, there are perhaps 2 distinct issues: (1) diagnose how we missed /pareto and /copenhagen in the wordpress export, and (2) disqus IDs for wordpress-exported posts.

narthur commented 1 year ago

@dreeves We didn't miss them per se. They were in the CSVs. They were just missing the raw markdown source URLs.

narthur commented 1 year ago

So I guess the question would be why didn't the builds fail given there were no source urls for those posts.

narthur commented 1 year ago

Oh, nvm. I see what you meant. They weren't in sources.txt. I'm guessing I saw that those entries were missing source urls and assumed those weren't valid posts so manually removed them from sources.txt. So I'm guessing that was a bad assumption on my part.

dreeves commented 1 year ago

But how were the source URLs missing for those two in the first place? I can't imagine anything different about the way we originally published those on wordpress.

narthur commented 1 year ago

@dreeves that I don't know. I could give you the raw exported data I have, but beyond that I wouldn't know how to debug it without access to the old blog.

dreeves commented 1 year ago

We do have the full backup of the old wordpress site if necessary but can we start by checking what's in the raw exported data? Are /pareto and /copenhagen just mysteriously missing the source doc URLs? Nothing else different about them?