digirati-co-uk / pmc-lux

Transforming data from PMC catalogues for import to LUX
MIT License
1 stars 0 forks source link

Library – omit duplicate Archive series level records #13

Closed brutaldigital closed 3 weeks ago

brutaldigital commented 1 month ago

Top-level archive collection descriptions are duplicated in Library catalogue: omit these from the export.

Appear to begin with an asterisk, e.g. *John Hayes archive or (SER) *This is a collection level description

see A275: https://lux-data-dev.collections.yale.edu/view/text/30a7c8a5-c5b7-47e6-8ab7-cd58d0730412

brutaldigital commented 1 month ago

NB these appear also to be classified as 'Illumination', which is incorrect: https://lux-front-sbx.collections.yale.edu/view/text/4ee69619-6f16-4bad-86c0-8fbc42682ab3

tomcrane commented 1 month ago

These are the 54 Library records that either have a <title> that starts with a *, or have <class>ARCHIVES</class> Most but not all have both:

Is it safe to simply ignore the library records with title starting *? Or is it more complicated?

A267 Brinsley Ford archive class: ARCHIVES A622 John Sunderland archive class: ARCHIVES D4347 A history of the works of Sir Joshua Reynolds PRA class: ARCHIVES A300 John Ingamells archive class: ARCHIVES A265 Alastair Smart archive class: ARCHIVES A275 John Hayes archive class: ARCHIVES A264 Robert Raines archive class: ARCHIVES D7616 A biographical dictionary of English architects, 1660-1840 class: ARCHIVES A780 Nicholas Goodison archive class: ARCHIVES A274 Oliver Millar archive class: ARCHIVES A268 Hugh Macandrew Archive class: ARCHIVES A391 Brian Sewell archive class: ARCHIVES E1466 Sculptors A-Z class: PHOTOGRAPHIC ARCHIVE class: Photo Archive A781 Fry Gallery archive class: ARCHIVES A366 Dennis Sharp archive class: ARCHIVES A296 Howard Colvin archive class: ARCHIVES A271 Malcolm Stewart archive class: ARCHIVES A358 Malcolm Baker archive class: ARCHIVES E1460 Artists A-Z class: PHOTOGRAPHIC ARCHIVE class: Photo Archive A270 Roy Strong archive class: ARCHIVES A269 W. G. Constable archive class: ARCHIVES E3177 Unidentified Artists class: PHOTOGRAPHIC ARCHIVE class: Photo Archive A364 William Packer archive class: ARCHIVES E1468 Sculpture by location A-Z class: PHOTOGRAPHIC ARCHIVE class: Photo Archive A392 Charles S. Rhyne archive class: ARCHIVES A621 Gavin Stamp archive class: ARCHIVES A365 William Roberts archive class: ARCHIVES A747 Frank Simpson Archive class: ARCHIVES E1473 Sculpture in the Caribbean : photographic collection compiled by Joan Coutu class: PHOTOGRAPHIC ARCHIVE class: Photo Archive A666 Humphrey Waterfield Archive class: ARCHIVES A389 Evelyn Newby archive class: ARCHIVES A390 Paul R. Joyce archive class: ARCHIVES E1501 Artists A-Z class: PHOTOGRAPHIC ARCHIVE class: Office 2.2 A624 Kerry Downes archive class: ARCHIVES E1469 Decorative painting class: PHOTOGRAPHIC ARCHIVE class: Photo Archive E1470 Sketchbooks, Albums of drawings, rare books & collections of prints and drawings class: PHOTOGRAPHIC ARCHIVE class: Photo Archive D7617 A biographical dictionary of British architects, 1600-1840 class: ARCHIVES A393 Nigel Surry archive class: ARCHIVES A266 Ellis Kirkham Waterhouse archive class: ARCHIVES A298 Daphne Haldin archive class: ARCHIVES E1471 Paul Mellon Collection of works on paper (includes work at the Yale Center for British Art, New Haven, National Gallery of Art, Washington D.C. and Virginia Museum of Fine Art, Richmond) class: PHOTOGRAPHIC ARCHIVE class: Photo Archive A294 Frank Herrmann archive class: ARCHIVES A362 John Gage archive class: ARCHIVES A293 Christopher Wright archive class: ARCHIVES A782 Michael Kerney archive class: ARCHIVES A359 Gilbert Benthall archive class: ARCHIVES A360 John Edgcumbe archive class: ARCHIVES A363 Hunting Art Prize archive class: ARCHIVES A394 Giles Waterfield archive class: ARCHIVES A619 Benedict Nicolson archive class: ARCHIVES A620 Paul Oppé archive class: ARCHIVES A361 Judy Egerton archive class: ARCHIVES D7618 A biographical dictionary of British architects, 1600-1840 class: ARCHIVES E1472 Paul Mellon Collection of paintings (includes work at the Yale Center for British Art, New Haven, National Gallery of Art, Washington D.C. and Virginia Museum of Fine Art, Richmond) class: PHOTOGRAPHIC ARCHIVE class: Photo Archive

tomcrane commented 1 month ago

(note to self)

    var els = xLibrary.Root!.Elements().Where(el => el.Element(LibNS + "title").Value.StartsWith("*") || el.Elements(LibNS + "class").Select(c => c.Value).Contains("ARCHIVES")).ToList();
    Console.WriteLine(els.Count + " records");
    Console.WriteLine();    
    foreach(var el in els)
    {
        Console.WriteLine(el.Attribute("ID").Value + " " + el.Element(LibNS + "title").Value);
        foreach(var cls in el.Elements(LibNS + "class"))
        {
            Console.WriteLine("     class: " + cls.Value);
        }
    }
brutaldigital commented 1 month ago

These are the 54 Library records that either have a <title> that starts with a *, or have <class>ARCHIVES</class> Most but not all have both:

Is it safe to simply ignore the library records with title starting *? Or is it more complicated?

All of those examples can be removed.