Closed GregBrimble closed 3 years ago
The only non-JSON-LD type I haven't enhanced in this most recent commit is Dublin Core. Dublin Core can be represented in a number of different ways, so again, I think if we tried here, we'd likely fail in collecting some implementations. Perhaps more can be done later with the raw HTML.
The only non-JSON-LD type I haven't enhanced in this most recent commit is Dublin Core. Dublin Core can be represented in a number of different ways, so again, I think if we tried here, we'd likely fail in collecting some implementations. Perhaps more can be done later with the raw HTML.
I'd be happy limiting this to extracting any <meta
tag with a name
property beginning with DC
(case-insensitive). I think that give us the majority of 'normal' usage.
Cool. As long as you say that disclaimer in your writing, that's good with me :) I'll add that in now.
The full meta/link tags were saved before we have more specific checks later. Twitter, Facebook and OpenGraph should all be covered now. Due to its expressiveness, it's just Dublin Core that might be missing some of the capturing that we've implemented.
If it's a storage space concern, I think we could nix it (@jono-alderson has already said they're fine with just capturing tags which begin with DC
). But if we can keep it, we might find other Dublin Core tags, which we'd otherwise lose.
Yeah, let's simplify and accept that we might miss some stuff. First year, let's keep it simple.
Consider it gone!
https://github.com/HTTPArchive/almanac.httparchive.org/issues/2174 https://docs.google.com/document/d/19KDSv4olAXUHUV6hq4X4Cb-lNziqvVesgXXxVktrw4c/edit#