FinalsClub / karmaworld

KarmaNotes.org v3.0
GNU Affero General Public License v3.0
7 stars 6 forks source link

images linked in KarmaNotes to TheFinalClub need to be offloaded #375

Open btbonval opened 10 years ago

btbonval commented 10 years ago

Parse KarmaNotes for IMG SRC links that point at TheFinalClub.org, download them, and push them to KarmaNotes' static S3 for CDN hosting.

See issue #144

AndrewMagliozzi commented 10 years ago

See the second image in this set of notes: https://www.karmanotes.org/note/harvard/the-human-mind-an-introduction-to-mind-brain-and-behavior/lecture-3-2708-the-multi-level-biological-approach

It links here but it does not pull in the image: http://www.thefinalclub.org/blogs/spring2008/HumanMind/wp-content/uploads/2008/02/posner-graph.jpg

Note that most of the images in Justice, Human Mind, and Protest Literature are like this:

<img src="/archive/images/no-image.gif" height="487" width="556">

I'm not sure if we'll ever be able to resurrect those images, but they should exist in the old TheFinalClub.org database.

btbonval commented 10 years ago

From Seth's comment in older ticket:

  1. Find all files that have no-image.gif
  2. find the instances of the tag that have an immediate parent of an that link to a image file.
  3. Auto fix a.href >>
  4. Fix the remainder by hand
btbonval commented 10 years ago

This appears unrecoverable in cases where there is no anchor around the image. "Fix the rest by hand" is a mysterious comment that likely, based on Andrew's assertions, revolves around looking at the old Wordpress site.

We need to find and run the old Wordpress site to fix this for reals.

btbonval commented 10 years ago

woopwoop. Grepped "Randolph Nesse has created" in the 4 sql dumps we found. Got a hit with a proper IMG SRC (as opposed to no-image.png).

Found in 10.6.166.43.sql.

err that's exactly the database we're already using for documents to Annotorius. Anyway, the table is classes_content and it's the 5th field, which appears to be content.

I don't know how we want to optimally process this. The missing image links are in the database above, but finding the missing image will be a matter of searching for content from the current ports found in KarmaNotes.

It is worth considering that we process the classes content table out of the database and produce new notes into KarmaNotes rather than try to retroactively fix old links. Maybe context searching will be fine.