YaleDHLab / ensemble-at-yale

Crowdsourcing the transcription of Yale playbills - http://bit.ly/ensemble-at-yale
http://ensemble.yale.edu
MIT License
6 stars 4 forks source link

Programs not retiring #172

Closed alexokeefe closed 5 years ago

alexokeefe commented 5 years ago

We've discussed this in the past, but I didn't see it in the open issues list, apologies if this is a repeat. There are programs that are not retiring even though marking is complete, and they have been transcribed by (at least) two different individuals.

Example 1: Beware of the Bull | set_id=5919dfcfadb8170833fb5056 Alex transcribed upwards of 4 times, Peter transcribed once (in a meeting, right?). Bad regions present.

Example 2: Pueblo | set_id=5919df88adb8170833fb4fe8 Alex transcribed 3 times, Tess transcribed once. No known bad regions.

This means there are potentially more completed programs than our retirement query shows.

alexokeefe commented 5 years ago

*Just noticed transcription retirement threshold may not have been reduced to 2 yet based on open issues list. Considered that this could be why Pueblo isn't retiring if unique users are required. However, just logged in with Haas Twitter and transcribed - still hasn't retired with 3rd unique user.

duhaime commented 5 years ago

Thanks @alexokeefe

Is it possible to start with a new playbill, mark the same set of fields N times, then transcribe each of those marks T times?

I ask because the order of operations can influence retirement--if a user marks new fields, even if all extant marks have been transcribed several times, those new marks need to be transcribed several times. Starting with a clean playbill will isolate variables...

alexokeefe commented 5 years ago

@duhaime I've added it to my to do list! I'll report back as soon as I can.

duhaime commented 5 years ago

Amen, merci!

alexokeefe commented 5 years ago

@duhaime done! It definitely seems like it wants three unique users, but it only needed it for one field that I definitely had typos or slightly different entries in (she has a middle initial and a few times I accidentally included the . with the K while others I didn't for example). It let some fields retire from User 1 (my Yale account), some retire from User 2 (Haas Twitter), and one took the third User (my google account).

Overall: Only had to do mark complete 2 times for that to be done (yay!) Did 11 transcriptions, technically with 3 unique users, but again only for a few fields

Clean playbill exercise: Playbill: Pop! | subject_set_id=591a0896adb8170833fb7a8a

Logged in with Yale account MARKING: Marked (marked complete) Went back and marked complete again Site shows Marked! TRANSCRIBING: Transcribed x2 Checked site - doesn't show transcribing completed Transcribed final time Checked site - doesn't show transcribing completed

Logged in with Haas Twitter Transcribed - didn't get all fields (only ones I know I made mistakes on...) Transcribed - didn't get all fields (only 2 - Music by Anna K. Jacobs & Keyboards I / Organ Randy Cohen) Checked site - doesn't show transcribing complete Transcribed - got same 2 fields as previous attempt Transcribed - got same 2 fields (messed one up) Checked site - doesn't show transcribing complete Transcribed - got same 2 fields (have definitely typed the same responses for the second more than three times now) Transcribed - only got first field (music by) Checked site - doesn't show transcribing complete Transcribed - only got first field (have definitely typed the same responses for this more than three times now) Checked site - doesn't show transcribing complete

Logged in with Google Transcribed - only first field Platform says transcribed!

alexokeefe commented 5 years ago

*I just realized I didn't mark twice as you suggested - I reviewed the first set I did and checked "marked complete" for the existing marks, which is what we instruct users to do since we don't need things marked more than once. If you would like me to try marking 2 times then transcribing to see the outcome on a new playbill, just let me know.

Or if there is another approach you would like me to try, I'm happy to. I now have three unique "users" to do experiments with, haha.

alexokeefe commented 5 years ago

I did one more run with a clean program, and went a little overboard tracking things - but I hope it helps us out. Marking wise, marked all fields once then did the "marking complete" toggle twice - showed as complete after that.

See attached excel doc for the breakdown of transcription attempts. Peter Pan test.xlsx Quick overview of discoveries/observations: -It's not a unique user issue (the program eventually retired with one user) -It doesn't seem to be a typo issue (I copy and pasted the fields so that they would match exactly) -I don't know why it got hung-up on date the way it did, that is the most normalized field in the set

@lindsaymking and I were talking about this, and wonder if we should just reduce transcription threshold for all boxes to 1. (i.e. only one person transcribes them, but we still have the 2 mark measure.) I'm ultimately checking each name/role individually in the workflow we've established after catching so many issues from previous retired programs - so really it just gives staff members more data to clean and will progress the project at a much faster pace with the programming models we've implemented. @pleonard212 & @duhaime what do you all think about reducing to 1?

duhaime commented 5 years ago

Thanks for this @alexokeefe. To briefly summarize, marking is retiring properly, but some marks require many transcriptions to be retired? Is that right?

alexokeefe commented 5 years ago

Correct - specifically some retire at 3 (which I believe is the correct threshold) while others take longer.

duhaime commented 5 years ago

Amen, thanks very much. I'll take a look at this later today...

alexokeefe commented 5 years ago

Sounds great! Thanks, Doug!

duhaime commented 5 years ago

Updated via:

db.workflows.update({name: 'transcribe'}, {$set: {'generates_subjects_after': 1}})

and via updates to annotation model.

@alexokeefe Could you please marking and transcribing a clean playbill and let me know if the current behavior is as expected?

alexokeefe commented 5 years ago

I’m out sick today but have it on the top of my list when I’m back!

duhaime commented 5 years ago

Amen, thanks Alex!

duhaime commented 5 years ago

I'm merging this branch to keep track of what needs attention but please feel free to reopen if this problem is not resolved @alexokeefe!