Open dcsw2 opened 1 year ago
SAMPLE REQUEST 1
-HMD+LWM collections only -Date range: 1880-1900 -for every title, take 7 random days per year; this gives 7 issues. For each issue include all articles, retaining metadata about issues, e.g. we want to know that articles belong to issues) -all OCR qualities
NB: the objects of inquiry are both article and issue, so it's important to select content within 7 issues
Is below the right set of tasks? Please amend as needed!
~~We have some thoughts/questions about how to define "1 week":
Sounds good @dcsw2 , I can do that. I can start working on it late this afternoon... if I start a script tonight you might have the sample sometime tomorrow. I'll keep you updated but ping me for anything else in the meantime - I'll be a bit busy with last-minute abstract writing and wrapping up stuff before I switch to part-time next week, but it's on my TODO for the day :white_check_mark:
T-Res output + article metadata fields:
NLP,issue,art_num,title,collection,full_date,year,month,day,location,word_count,ocrquality,decade, mention, candidates, candidate_names, sent_idx, end_pos, tag, sentence, prediction, prediction_name, ed_score, latlong, wkdt_class
Including toponym mentions that return NIL candidates
Amended to leave out POS until @dcsw2 and I discuss
Sounds good @dcsw2 , I can do that. I can start working on it late this afternoon..
Sample in google drive here: https://drive.google.com/drive/folders/1GCQJXT2ZI_EtGgHQeqOyn6TYe4Ww7lQI
Sample stored in azure here: storageexplorer://v=1&accountid=%2Fsubscriptions%2Fb8871872-a5e3-473f-b9b9-f4baaab6a9a0%2FresourceGroups%2Flivingwithmachines%2Fproviders%2FMicrosoft.Storage%2FstorageAccounts%2Flivingwithmachines&subscriptionid=b8871872-a5e3-473f-b9b9-f4baaab6a9a0&resourcetype=Azure.BlobContainer&resourcename=topo
(just leaving @fedenanni and @lukehare assigned as they are active on this right now) - @fedenanni when you're ready for @dcsw2 and I to review, just re-assign us! I'm trying to get better at this ;)
21 Feb: a first sample to run through T-RES