Open jhammock opened 4 months ago
I think it may be worth starting with a comprehensive page inventory that we can then leverage in different ways, e.g., by developing a rationale for a minimally informative page or a new concept of a "rich" page and pages that are rich or poor with respect to certain content types. So I would want to include number of articles in this inventory, along with the article subjects and languages. For the media, I would want a count of media objects by media type (image, video, sound). At some point we could start leveraging the computer vision code to provide further categories for images. For trait data, I would want a count of all data records, a count of the measurementTypes and a list of all measurementTypes represented on the page. This would allow us to give special consideration if a page has measurementTypes from a meaningful/informative list. It may also make sense to list the values for categorical data, making exceptions for values of certain measurement types for people, institutions, and geography.
That's fine with me. You're right- we can derive simpler metrics from this as the needs arise. I can't think of any other value types that would be numerous and unexciting, so, skip counting and listing measurementValues for
http://eol.org/schema/terms/TypeSpecimenRepository and http://eol.org/schema/terms/Present
what about children of /Present? Native, introduced, adventive, etc? I'd certainly want to know if they were there, but I think I could skip counting/listing geographic values for those too.
I don't think people feature in our data yet as values.
Apart from that, do we want a count of records per measurement type (and/or per value, for values we are listing)?
Precipitated by a discussion of per-taxon content metrics with @metasj , though also weirdly similar to #6 . We would like to institute a simple (maybe 3- or 4-component?) content summary within EOL. Proposed measurements, for discussion:
I could stop there, but of course we could also include text objects, vernacular names (language count), and/or measure depth on some of the above.