NYPL / ami-tools

MIT License
16 stars 6 forks source link

pamidb_to_json.py: line tabulation kill? #25

Open bturkus opened 6 years ago

bturkus commented 6 years ago

Hi, I've noticed that, from time to time, we've been accidentally adding (and not noticing) extra lines being added to database fields. These pose a problem because, when exported in merge files and transformed into JSON, they end up as weird new line characters that fail validation.

So this in a mer: screen shot 2018-02-23 at 9 00 50 am Ends up as this in JSON: screen shot 2018-02-21 at 9 53 17 am

We can certainly try to impose more order in the a database, but I see this being a recurring problem, and if we could add a kill mechanism of some sort (without losing information), that'd be wonderful. Let me know what you think... Thanks, Ben

nkrabben commented 6 years ago

This is a wonderful feature request. Let's queue up a ticket.

bturkus commented 6 years ago

False prophet ben here, maybe make a list of kill characters and wack the data frame with a replace function.

nkrabben commented 3 years ago

Do you have a list of kill characters? and what should they be replaced with? ""?

bturkus commented 3 years ago

this one is worth keeping open I think. I'll work on a list fo characters, but I believe this /u000b is the big offender