Closed mkosmala closed 8 years ago
I split the S7 and S8 issues, since they have different needs. For S8, we need to recreate the cleaned file list to contain proper timestamps (with their seconds) and in the same format as previous seasons. It looks to me like the current S8 "clean" file was saved in Excel <
Just wanted to add that we should never even open the files in Excel. Even if you don't save the file, if you so much as view the file in excel, the dates are interpreted and converted. It's completely ridiculous, but true.
@aliburchard That doesn't happen on Windows -- you have to actually save the file to change it. But wow. OMG, it does happen. W.T.F.
Argh, who would make such an evil program as Excel??? Yup, I probably did open the file in Excel to view it and the changes happened then. I certainly didn't do any manual editing to the timestamps. It won't happen again...
To remake the cleaned file, you can run the scripts outlined in the "Protocol" document saved in MSI in the "TimeStampCleaning" directory. Margaret, do you have time to do this quickly? Otherwise I can work on it this weekend.
No, I am unfamiliar with the cleaning scripts and have been waiting for you to update/fix them. See #49 for details.
Okay, there's a new copy of S7 and S8 cleaned data up on MSI in the TimeStampCleaning/CleanedCaptured directory, in the new format you wanted and the timestamps should be unmolested by any Excel tampering. Took a while on S8 to get together all the manual changes we made last time with the wonky long capture events, but should be identical now.
Having a meeting with IT on Wednesday to take a look at their cleaning scripts. I won't run them on S9 until I've run them on S7 or S8 to test them out - I'll keep you updated on how that goes. If there are too many problems, I might just go ahead later this week and use our current cleaning scripts to get to work on S9. Let me know if you still want me to hold off - if there's additional work I can be doing on S7 or S8 or if there's still something wrong with the cleaning outputs.
@palme516 When you say "should be identical", does that mean you have compared them? Either by script or spot-checked them? I'm not going to be checking that they're the same. And we want to be absolutely sure they're the same so we don't make a Big Mess.
I went through and checked/compared the entire sites (all rolls) for ever site that I could find mention of us having issues with in all of our GitHub and email threads (compared number of capture events, invalid IDs, etc). I also checked through the entire datasets looking at everything that was marked as invalid to make sure that we had the same number of invalid images, number of images per capture event, and invalid codes.
Just ran some code on S7 and S8 looking for differences in the number of rolls and the number of capture events within those rolls for each site. The files I remade and the files we sent to Zooniverse have the same number of rolls and capture events.
Excellent. Last thing I need from you is two short (one-line) 'Season' files. The format should be a comma-separated line with fields:
No headers needed. So a file might simply look like:
7,2013-06-01,2014-01-31,"Meredith in field"
PS. According to my notes, S7 should begin on 2013-06-07.
Hm, the earliest valid entry for S7 in the data is 2013-01-29?
I uploaded this file ("SeasonFile.csv") into the TimeStampCleaning directory on MSI
Is that possibly a missed roll from S6? I remember we had a group of rolls from S6 that didn't make it back to MN until S7 was brought back...
There are rolls from 161 sites (valid data) between the dates of 01-29-2013 and 06-06-2013, so I'm guessing not. Where did you get your date from?
It's the last date of S6. But let's use your dates for S7. They're meant more to be general guides. If we ever need exact stop and start dates we can calculate them based on the images on the first and last rolls of that season.
I think there's always a few months of overlap because of the schedule of SD card collection. I put the earliest and latest valid picture dates for 7 and 8 into the season file and there's about two months of overlap there as well.
I'm working on getting S8 imported. My script tells me that S05 is a newly created site for Season 8. Is this true, @palme516 ?
To import S8, I'm going to need the following information for all new sites:
@aliburchard, I don't know when the hand-off between you and Meredith was. Do you know anything about Season 8 cameras?
This is just a smidge before my time, but there is no site S05. I just went in and looked at the pictures in this S05 folder, and the one S05 roll is actually for site S07 (Norbert must have mislabeled the folder). I believe that S05_R1 is actually S07_R1 (we have an S07_R2 which occurs a few months later, but we're missing R1). Let me know if you want me to relabel this roll + images on MSI, or whether you need to make changes to the database first, or whether we should keep the current labeling but with an appropriate note.
Yes, please relabel the roll and fix anything else that needs fixing at MSI. Then rerun the cleaning script(s) so that the output has the correct site in it.
FYI: My importing-to-the-database script runs some checks first, before importing, to make sure nothing will break when the imports happen. So S8 hasn't been imported at all yet, since the checks didn't pass.
@aliburchard: Here's an interesting question: can we change the metadata at Zooniverse? It would be fairly easy (I think) to search for S05 in the site field in the Zooniverse database and change all occurrences to S07. That way, when we import that Zooniverse classifications, everything will match up properly.
Everything is renamed, rerunning scripts on S8 right now.
I'll let you know when everything is fixed up and the final clean copies of S8 are in place.
I left the Serengeti sometime in summer 2013, so I think S8 was well after my time.
I seem to recall establishing a camera at S05, but there's a good chance that that camera never produced useable photos (e.g. if the camera was likely stolen or damaged). That being said, I don't have my field notebooks or the MS Access field entry database in the UK so can't confirm that.
@palme516 just want to make sure I'm understanding your above note correctly. Have you reviewed the images from what is labeled as S05_R1 and compared them to the images from S07_R2, in terms of the vegetation in front of the camera as well as the date-time stamps on the last image from S05_R1 and the first image of S07_R2? Have you asked Norbert about this S05 to ensure that there is indeed nothing else going on?
@mkosmala I don't think there is any easy way to change metadata in the Zooniverse DB, but will ask Michael about it, as it would help us out enormously.
@palme516 is there no check on the data entry form so that you can only enter sites that "exist" in the Access DB? Probably a good check to enforce...
S8 is 2013/2014 -- I was talking in terms of field work. S07 is the highest "S" site we have -- an S05 site would have been far into the wet boggy foresty fly area where all the Maasai are - I doubt any site up there would have lasted eight seasons...
Anyhow, @aliburchard, the blackboard in the picture says "S07". I've already gone through the sites I'm processing right now to make sure that we don't enter any impossible sites again!
@mkosmala @palme516 Michael is happy to make the change in the Zooniverse DB. Can one of you confirm that it's only 24 images in S8 S05 before he makes that change? Thanks!
@aliburchard I see 40 images?
Ah, sorry, 24 subjects. that sounds about right though (40 images == 24 subjects when you average day and night photos)
Ah, gotcha!
Confirmed. 24 captures/subjects and 40 images.
On Tue, Oct 6, 2015 at 10:59 AM, ali notifications@github.com wrote:
Ah, sorry, 24 subjects. that sounds about right though (40 images == 24 subjects when you average day and night photos)
— Reply to this email directly or view it on GitHub https://github.com/SnapshotSerengetiScienceTeam/DataManagement/issues/47#issuecomment-145885569 .
@mkosmala Cleaned S8 files are up on MSI
@palme516 There's something weird happening in the cleaning scripts.
In the S8 cleaned file, for L10, roll 1, capture event number 132, the images start numbering at 19. The database won't import unless for each capture event, the images start numbering at 1. I checked S8_captures.csv, and the image numbers start at 1 for this capture in that file, so it's definitely something going on in the cleaning scripts. If it's helpful for debugging, the image numbers keep incrementing from 19 for capture 132 into capture 133! Then for capture 134, they start numbering at 1 again.
Woo, what the heck. Debugging right now!
Hm, okay, I think I caught the primary error and fixed it. Just uploaded a new S8_cleaned onto MSI.
S8 metadata imported!
Summary from the database: 521 rolls 362,311 capture events 987,032 images
362,213 capture events went to Zooniverse (99.97%) By invalid code: 0: 359,557 1: 98 2: 12 3: 2,644
Do these numbers look right, Meredith?
I get almost the same numbers looking at the capture file except for the number of invalid CEs for 1 and 3 -- I get 123 INVALID1 CEs and 2641 INVALID3 CEs.
How many total capture events do you have?
Okay, so some of the numbers I gave you don't hold out, but I still have a few discrepancies between your numbers and the newest cleaned file. I have the following CEs containing at least one invalid 1:
L06_R1 CE 1003 L10_R1 CEs 307, 346, 549, 711, 877, 892, 1004, 1005, 1072, 1137, 1164, 1179, 1185, 1283 L10_R2 CEs 116, 117 L10_R3 CEs 103, 105, 135, 190, 209, 245, 280, 381
@palme516, can you upload the cleaning notes files for S8 to GitHub like you did for S9 and S10? My guess is that the discrepancies here arise from the same problem as in S9 and S10 -- namely, invalidating some images within captures. But I want to investigate to make sure.
@mkosmala Just uploaded! Let me know if you have questions.
@palme516
Okay, so I don't think there are any problems in the database, and we can consider this closed.
Split from #4