FinalsClub / karmaworld

KarmaNotes.org v3.0
GNU Affero General Public License v3.0
7 stars 6 forks source link

Old production notes store data in HTML field, need to be migrated #354

Closed btbonval closed 10 years ago

btbonval commented 10 years ago

Approximately 600 notes are storing their content in the HTML field of the database.

The assumption moving forward is that all note content will be hosted on a static file server and not the database. We had implemented this assumption in code which broke those 600odd notes. The assumption was recently reverted for the old style notes.

The HTML field in the database needs to be removed someday. Once done, those old notes would be lost.

We need to write a manage.py command to migrate all old notes (those with html defined and not fp_file) onto our S3 system.

The optimal way to do this is to push the HTML to Filepicker (possibly converting HTML to PDF first if necessary/desired), then update the notes with their FP URL. This will have two effects: all notes will have an FP URL, and all notes will be statically hosted as per the current system design.

AndrewMagliozzi commented 10 years ago

This is fixed now.

On Mon, Mar 10, 2014 at 9:57 PM, Bryan Bonvallet notifications@github.comwrote:

Approximately 600 notes are storing their content in the HTML field of the database.

The assumption moving forward is that all notes will be hosted on a static server, not the database. We had implemented this assumption in code, but it was recently reverted.

The HTML field in the database needs to be removed someday. Once done, those old notes would be lost.

We need to write a manage.py command to migrate all old notes (those with html defined and not fp_file) onto our S3 system.

The optimal way to do this is to push the HTML to Filepicker (possibly converting HTML to PDF first if necessary/desired), then update the notes with their FP_URL. This will have two effects: all notes will have an FP_URL, and all notes will be statically hosted as per the current system design.

Reply to this email directly or view it on GitHubhttps://github.com/FinalsClub/karmaworld/issues/354 .

charlesconnell commented 10 years ago

While there are approximately 600 notes storing their HTML contents in the database, all but 1 have the static_html flag set to True. This means we're not actually using the HTML stored in the database, even though it is there. The 500 errors that originally started this problem were caused by notes that did not have a Filepicker URL associated with them, which is a separate issue. Here is an example of a note that has these properties: https://karmanotes.org/harvard/aesthetics-and-interpretive-understanding-31-american-musicals/aesthetics-and-interpretive-understanding-31-ame I've written a management command to upload these notes to Filepicker and set their Filepicker URL fields.

btbonval commented 10 years ago

Actually the 600 notes distinctly lacked a FP URL. The fact that they had HTML in the database was a requirement for doing something about it.

So this is done?

On Tue, Mar 11, 2014 at 11:21 AM, Charles Connell notifications@github.comwrote:

While there are approximately 600 notes storing their HTML contents in the database, all but 1 have the static_html flag set to True. This means we're not actually using the HTML stored in the database, even though it is there. The 500 errors that originally started this problem were caused by notes that did not have a Filepicker URL associated with them, which is a separate issue. Here is an example of a note that has these properties: https://karmanotes.org/harvard/aesthetics-and-interpretive-understanding-31-american-musicals/aesthetics-and-interpretive-understanding-31-ame I've written a management command to upload these notes to Filepicker and set their Filepicker URL fields.

Reply to this email directly or view it on GitHubhttps://github.com/FinalsClub/karmaworld/issues/354#issuecomment-37307803 .

btbonval commented 10 years ago

Just ran this on prod:

karmanotes=# SELECT count(id) FROM notes_note WHERE fp_file = '' AND (pdf_file <> '' OR gdrive_url <> '' OR upstream_link <> '' OR text <> '');
 count 
-------
   653
(1 row)

Charles, thanks for writing the script. We still need to run it on prod. I'm opening the ticket until prod's notes are on S3, which really solves the problem reported here.

charlesconnell commented 10 years ago

Having HTML in the database is not a requirement for fixing this issue. As you saw in my new management script, we can grab the HTML from S3.

btbonval commented 10 years ago

err how did the HTML get to S3?

The 600some notes were from an old system that never made it to S3 (filepicker or otherwise). The only trace of the notes are the text/html in the database.

Unless you mean to say after you run your script wherein we have pushed that HTML /to/ S3 from the database. In that case, yes, problem solved and the HTML is now available from S3.

charlesconnell commented 10 years ago

It got there because of the populate_s3 management command, which was apparently run a while back on beta, and I'm hoping on prod too.

charlesconnell commented 10 years ago

On prod in Django shell:

>>> Note.objects.filter(static_html=False).count()
1
btbonval commented 10 years ago

Ohhh we already had that code pushed the HTML up to S3 from populate. derpderpderp. I got it now. Right, so the HTML was on S3 already ... okay I'm with you now. That's why the slug was used, because it's the only way to identify the note's location on the S3. I haven't thought about that setup in awhile apparently.

Even if static_html is set, we still need that fp file to be filled in correctly, because the current code assumes all files were uploaded via Filepicker. That's what your script is doing. Hot sauce gravy train!

btbonval commented 10 years ago

Your script is almost exactly what I had in mind when I wrote my response to Andrew re: what to do about the old style files. Ignoring the fact that almost all the HTML was already on S3, which I forgot about, so my brain went a slightly different direction with that.

charlesconnell commented 10 years ago

manage.py populate_filepicker has been run on beta and prod.

charlesconnell commented 10 years ago

And yesterday's fix has been rolled back.