SnapshotSerengetiScienceTeam / DataManagement

Scripts and issues to manage the SnapshotSerengeti images and metadata.
GNU General Public License v2.0
0 stars 0 forks source link

Import S7 and S8 classifications into database #55

Closed mkosmala closed 7 years ago

mkosmala commented 8 years ago

Requested data dump from Zooniverse of just S7 and S8.

mkosmala commented 8 years ago

@palme516 @aliburchard @tesherista I'm making progress on getting S7 into the database. I've just matched up the Zooniverse output to what we have in the database. (Ali will remember that this was a multiple-month nightmare for S1-S6.) The good news is that it's very much less nightmarish with our new system. But we still have some discrepancies that we need to make decisions about.

  1. For S7, it appears that Zooniverse slurped up all our images, regardless of whether or not they were marked invalid or do-not-send-to-Zooinverse. For S7, this is <50 images, so not a huge deal. But if we ever have a season with a lot of Invalid=1, then we're wasting volunteer effort. 1a. @aliburchard: can you coordinate with whomever is in charge of the Zooniverse scripts and make sure that they ignore images that are Invalid=1 and Send-to-Zooniverse=0? 1b. What do we want to do with these classifications for invalid captures? We can import them into the database (but that will slow down the S7 import, as I'll need to modify some scripts), or we can not import them into the database (as would be the case if we hadn't sent them to Zooniverse in the first place). @palme516, this is mostly your call, but @aliburchard might have opinions, too.
  2. We have one capture in our database with weird filenames that I can't find as files at MSI. @palme516, do you remember these two files? Should I remove the capture containing them from the database? S7/I12/I12_R2/P1060600.JPG S7/I12/I12_R2/P1060601.JPG
  3. We appear to have a set of 25 or so images (25 captures of one image each) that never made it to Zooniverse. I don't know why they weren't grabbed. @aliburchard, can you confirm that these images aren't in the Zooniverse database? S7/M13/M13_R2/IMAG3322.JPG S7/M13/M13_R2/IMAG3323.JPG S7/M13/M13_R2/IMAG3327.JPG S7/M13/M13_R2/IMAG3328.JPG S7/M13/M13_R2/IMAG3329.JPG S7/M13/M13_R2/IMAG3330.JPG S7/M13/M13_R2/IMAG3331.JPG S7/M13/M13_R2/IMAG3332.JPG S7/M13/M13_R2/IMAG3333.JPG S7/M13/M13_R2/IMAG3334.JPG S7/M13/M13_R2/IMAG3335.JPG S7/M13/M13_R2/IMAG3336.JPG S7/M13/M13_R2/IMAG3337.JPG S7/M13/M13_R2/IMAG3338.JPG S7/M13/M13_R2/IMAG3339.JPG S7/M13/M13_R2/IMAG3340.JPG S7/M13/M13_R2/IMAG3341.JPG S7/M13/M13_R2/IMAG3342.JPG S7/M13/M13_R2/IMAG3343.JPG S7/M13/M13_R2/IMAG3344.JPG S7/M13/M13_R2/IMAG3345.JPG S7/M13/M13_R2/IMAG3346.JPG S7/M13/M13_R2/IMAG3347.JPG S7/M13/M13_R2/IMAG3348.JPG
meredithspalmer commented 8 years ago

Progress! Awesome! Glad it's going a bit smoother this time around...

@mkosmala ... 1a. I'm okay with not importing them.

  1. I just checked the original hard drive-- those two images are the images of the chalkboard (it looks like I deleted one and renamed the other "IMAG0000.JPG" in the MSI data, probably so that I didn't have to change my scripts to pick up those files). There's no animal data, so I don't think it matters too much whether the capture gets imported.
mkosmala commented 8 years ago

Okay, so we won't worry about importing classifications for invalid images (1 above). I updated the database to point to the correct image for 2 above. I don't see Zooniverse classifications for this capture, so it is now in the third category above -- images that didn't get sent to Zooniverse, but that have captures in the database. Neither issue 1 or 3 will slow down getting S7 and S8 into the database, so I'm going to break them off as their own new issue.

mkosmala commented 8 years ago

Users added for S7. (~16,000 new ones!)

Links between Zooniverse IDs and our CaptureEvents added for S7. Total S7 linked captures = 304,645.

Total S7 invalid=[0 or 2] captures = 304,671. Difference = 26 (the number of captures that don't appear to have gone to Zooniverse).

meredithspalmer commented 8 years ago

Awesome! Excited to take a look at this data. Please let me know when you have time to upload it to the SQL database. Thanks!

tesherista commented 8 years ago

I'm going through what @mkosmala & I went over Fri 3/18, will hit #65 soon. Probably going to finish up S7 (Margaret, confirming that's officially on my plate now, yes?) & start S8 concurrently; will add changes/clarifications to the uploading classifications notes.

mkosmala commented 7 years ago

I'm assuming this should be closed?