RockefellerArchiveCenter / scripts

Useful scripts
MIT License
16 stars 5 forks source link

Updates edit_notes.py, removes global declarations, adds columns to csv #90

Closed kemartin2015 closed 1 year ago

kemartin2015 commented 1 year ago

I tried to address the deprecated ASnake endpoint issue, remove the global declarations, and add more information to the csv file to make it more consistent with the csv file output in the notes_csv.py script.

So far, I'm consistently getting error messages in line 25 where I'm trying to get json for notes each time I try a different approach. Could you offer some guidance on how to fix this line? I'm also unsure about my changes to lines 42 and 52.

kemartin2015 commented 1 year ago

Hi @helrond, thank you for these suggestions. I worked on addressing your comments, but I still don't think I'm passing client correctly because I'm getting this error pretty consistently:

raise Exception('Could not resolve "{}" to a URI'.format(thingit)) Exception: Could not resolve "Namespace(note_type='accessrestrict', action='delete', resource_id=13291, search_string='restrict', level='all', replace_string=None)" to a URI

I agree that it makes sense to change this script to an object-oriented approach. It's been a challenge to jump back in and make changes with all of the arguments that need to be passed around.

kemartin2015 commented 1 year ago

Hi @helrond, I made quite a few changes, and the script is working now. Could you take another look whenever you get a chance?

kemartin2015 commented 1 year ago

Hi @helrond, I think I've addressed the issues in your previous round of comments, but I have some more questions now:

  1. Now that I’ve untangled some of these methods, I’m not sure how to get the data back to main. Does it make sense to move where resource is defined and walk_treeto the main method to alleviate some of the passing issues? I’m also not confident in the logic in main.
  2. I’m still passing quite a few arguments into handle_matching_notes even after moving save_record out. Is that okay or should these lines be moved back into process_tree?
  3. I’m getting an invalid syntax error when I try to pass args.resource_id into process_tree. Do I need to rename the args. arguments to prevent that error?
  4. I added in the ASnake text_in_note method, and to me, it looks like a simpler way to do what we were already doing, but I’m not sure if I’m using it correctly?
kemartin2015 commented 1 year ago

Hi @helrond, I'm getting a couple errors when I run the script.

In line 60, i get this error:

TypeError: argument 1 must have a "write" method

In line 26, I was getting the following error. Is the issue with the content argument? The script runs if I change content to note in line 26, but then it modifies/deletes every access note even with a high confidence ratio.

Traceback (most recent call last): 
  File "/Users/k.martin/Documents/GitHub/scripts/archivessnake/edit_notes.py", line 83, in <module> 
    main(args.note_type, args.action, args.resource_id, args.search_string, args.level, args.replace_string) 
  File "/Users/k.martin/Documents/GitHub/scripts/archivessnake/edit_notes.py", line 65, in main 
    processed_record, updated = process_record(record, level, note_type, action, search_string, replace_string) 
  File "/Users/k.martin/Documents/GitHub/scripts/archivessnake/edit_notes.py", line 26, in process_record 
    if text_in_note(content, search_string, CONFIDENCE_RATIO): 
  File "/opt/homebrew/lib/python3.9/site-packages/asnake/utils/__init__.py", line 117, in text_in_note 
    note = resolve_to_json(note, client) 
  File "/opt/homebrew/lib/python3.9/site-packages/asnake/utils/__init__.py", line 49, in resolve_to_json 
    json = client.get(thingit).json() 

AttributeError: 'int' object has no attribute 'get'
helrond commented 1 year ago

OK, there was an error with the way the CSV Writer was being created; I've fixed that now and that resolves the issue in line 60.

The other issue is slightly more complex. It looks like the text_in_note utility is expecting that you will pass a note object, a query string, an ASpace client and optionally a confidence ratio but instead we are passing subnote content and a query string. HOWEVER we can't just move that conditional further up and update the arguments because you're replacing content in a specific subnote. So it might actually make sense here to go back to the way you had it originally, even though it's somewhat duplicative of what ASnake can do.

kemartin2015 commented 1 year ago

Ok, sounds good! Thanks for explaining the text_in_note issue. That makes a lot of sense.

kemartin2015 commented 1 year ago

Hi @helrond, I reverted text_in_note back to contains_match, but I don't think your change to csv writer was committed because I'm still getting that error.

helrond commented 1 year ago

Ah, sorry I forgot to push. Changes are there now.

kemartin2015 commented 1 year ago

I tested the script on FA739A and FA739B in AS dev, and it looks good to me. I didn't have issues modifying or deleting access notes. The output CVSs matched the original output of the notes_csv,py script, and I didn't see any unexpected changes in other notes or other areas of the finding aid. Is there anything else I should be testing for?

helrond commented 1 year ago

I don't think so? We mostly just moved things around so they're a little more atomic. At some point in the future we could also write tests for these scripts (particularly the ones which are writing data back to AS) so we have a better safety net.

kemartin2015 commented 1 year ago

Ok, sounds good! Should I go ahead and merge these changes?

helrond commented 1 year ago

Sure, go ahead and merge!