Closed randyau closed 8 months ago
Hi @randyau,
Thanks for the detailed report and sample file! 🙌
I've had a quick look and can see a solution, which I'll get implemented & released to the CLI tools and beta migratory soon. I'll update this issue when that's done.
This is now fixed and released in @tryghost/migrate@0.37.0
& @tryghost/mg-substack@0.4.0
, and in the self-service migration tools.
Using the CLI migrate tool on my substack and about half of them have broken feature images. Same thing also happens with the Beta migrator tool in Labs since it's probably using this exact same code
The feature_image export in ghost-import.json features a CDN's URL instead of the expected local scraped copy. Following the CDN link yields an "Access Denied" error
I went to the originating post in the exported html from Substack (exported 2023-12-28), and the top image that should've been converted to the featured_image is this img tag. Looks like the img sources the "bucketeer" AWS host that is the broken url being imported, and also has a data-attrs referencing the same broken url. Not sure which one the migrating tool is pulling. The tag also provides a raft of srcsets to actual images that are downloadable, so the html actually displays in a browser.
The problem seems to affect all my posts prior to around January 2023, but it's not clear why there's a difference at all.
example of a broken html file from the export here buggy_html.zip