2020PB / police-brutality

Repository containing evidence of police brutality during the 2020 George Floyd protests
MIT License
2.62k stars 209 forks source link

Identifiying Dead Links And Replacing Them #409

Open bonedaddy opened 4 years ago

bonedaddy commented 4 years ago

For now most links are actively and viewable, however we will inadvertently get dead links, such as those reported in https://github.com/2020PB/police-brutality/pull/392

While "dead" the data isn't lost as it will be captured by my archiver tool, we need a method for:

1) Identifying dead links 2) Replacing/supplementing dead links with the backups on IPFS

I'm not sure what the best method is, I suppose I can have some central listing place that I periodically post the new backup links to?

ghost commented 4 years ago

I think we could use a torrent file so others can grab from your archive and create redundancy. Of course we need an ID system to ensure it's easy to grab from the file. This would also allow maintainers to re-upload.

We may want to use a combo of free services like streamable and image.fri so most folks can re-establish links as well as an AWS solution one of the maintainers could host from as suggested in another thread. A mix of centralized and decentralized.

bonedaddy commented 4 years ago

IPFS is somewhat like torrents in the sense that people can "seed" the data. There's a WIP PR I have going https://github.com/2020PB/police-brutality/pull/286 that contains the instructions on how to mirror the archive

ubershmekel commented 4 years ago

@bonedaddy I think we're going to need to have some of this backed up media links inside the repo directly. Especially when links die.

Murkantilism commented 4 years ago

@bonedaddy @ubershmekel may I propose a simple square bracket tag for identifying dead links? I just found one:

image

ubershmekel commented 4 years ago

I would make the language less morbid, but I agree. Perhaps something like

[original link that is now broken](https://example.com)
Murkantilism commented 4 years ago

@ubershmekel ah perhaps I chose a bad example, I meant more like this, with a whitespace separator:

[Dead] [Photojournalist's account](https://twitter.com/bfeinzimer/status/1277014331968782339)

To preserve the original context if trying to replace it. And yeah I'm fine with different language, something like [Broken] or [404].

ubershmekel commented 4 years ago

@Murkantilism I misread your example. At the moment I would prefer to keep the markdown syntax to keep the parser simple and fit the existing data structure at https://raw.githubusercontent.com/2020PB/police-brutality/data_build/all-locations-v2.json

Murkantilism commented 4 years ago

@ubershmekel ah good point! Maybe a pipe separator within the link markdown, something like this?

[Broken Link | Photojournalist's account](https://twitter.com/bfeinzimer/status/1277014331968782339)

Also, do we care about differentiating why a link is broken? ie: if the twitter account was deleted versus a genuine 404 page for example.

ubershmekel commented 4 years ago

@Murkantilism that looks fine by me.

On differentiating why broken - I'd be fine with either option. Though managing a nomenclature for such a system might be a bit much for a small project like ours.