Open ekemeyer opened 2 weeks ago
Ugh
I'm framing up comparison logic; I plan to read each title cell in the Peabody sheet, then process each text to:
Here is a list of unique words (after filtering already described); will you remove all but ones to IGNORE, please? i.e., I will search for all "titles" but not for words in the list to be ignored. words_unique.txt
Results illustrated in script editor; text values to be written to file :
ran script overnight, inspected output and killed it because iterator bug. Fixed that; re-arranged order of output columns for better VLOOKUP performance in Excel; added code to generate another file to report paired values of barcode and title word counts for use later in Excel (e.g., this a given Peabody barcode could match a given LOC title on 4/7 words, and/or filter matches by the decimal value of matches, etc)
killed script again because it was working at a problematic (slow) rate.
exported data from Excel to TSV text and rewrite everything to work in BASH.
if this will be a regular need, I'll do it waaay more easily in FileMaker, I suspect.
data will move back into Excel after new sheets are installed.
Details
Hi - I'm hoping I can get some help ID'ing items in an inventory we received from Peabody that are possibly either 1) already in the AAPB or 2) at the LOC. I've pulled out everything I was able to ID as already in the AAPB, as well as all WGBH programming. I've uploaded two inventories to a folder on the shared drive: the Peabody inventory UPDATED_PBS_in_Peabody_1970-1999_MKrev.xlsx) and the LOC's PBS-NET inventory (LC_NET-PBS_2in.xlsx). If you could ID possible matches, I'll then go in and do more manual checking. Let me know if you have any questions!
Submitted by: Michelle CC in communications: Priority: Medium (within this month) URL: https://drive.google.com/drive/folders/1GSmhnHrkkAfQQBdOenESGApYyVT086PD?usp=drive_link Slack message thread: