Closed ekemeyer closed 1 month ago
many emails and DM's later, I decided to pursue a strategy of capturing their metadata via screen scraping.
It's a Drupal site that failed to provide any expected JSON format output, but it did retain the basic /node/ structure so I used that to collect URLs to all their "articles" and then drilled into those to discern common structures. At the end, I used lynx
dumps to scrape URLs and metadata to files in folders seen in the ZIP archive here. I explained all to Rochelle and provided the ZIP archive plus a list of 48 items that failed to yield expected results; she supposed that those items weren't yet digitized...
Details
Welcome back, Kevin! Hope you had a great vacation. Guess what? Another station with a Drupal database that can't export its catalog. :melting_face: This one should be a lot easier as it's not that many records and the people at the station are well-organized. As our resident expert on Drupal databases, would you be up for meeting with the team to brainstorm a solution please? One of the team member suggested using a webcrawler - but there has to be a better way.
Submitted by: Rochelle CC in communications: Priority: Medium (within this month) URL: Slack message thread: I have an email I can forward.