Open mbjones opened 6 years ago
From notes, only ~1500 currently have attribute descriptions. JEsse’s team is working through the other 3500 to add attribute-level metadata. Any dataset in the ADC is a candidate for annotation, so all datasets will need to be examined/understood at some level. We could ask the data team to add a keyword for us to query, but it would be safer to examine everything. We would rerun the query periodically.
so need a query to return query_date, pkgid,entity-name, attributename
Initial query might resemble: https://cn.dataone.org/cn/v1/query/solr/?fl=identifier,title,attribute&q=formatType:METADATA+AND+(datasource:*ARCTIC)+AND+-obsoletedBy:*+AND+(attribute:*)&rows=100&start=0
Issues:
later queries will need to add since-date, probably dateModified or dateUploaded
Jesse says there may be only 600. 1500 is what they have processed since the ACADIS migration (in April 2016). they did not define attributes at first, that began later (maybe December 2016)
Bryce says that d1 EML path dataset/dataTable/entityName is not indexed. for list of indexed fields, see: https://repository.dataone.org/software/cicore/trunk/cn/d1_cn_index_processor/src/main/resources/application-context-eml-base.xml
a similar query to ADC: returns a different number of datasets. ADC coders to investigate: https://gist.github.com/amoeba/2546994813f58edb8bc93ff6510767ef
so query ADC MN. start with this, note that MN name is still there, but explicit (no wildcards) https://arcticdata.io/metacat/d1/mn/v2/query/solr/?fl=identifier,title,attribute&q=formatType:METADATA+AND+datasource:%22urn:node:ARCTIC%22+AND+-obsoletedBy:*+AND+(attribute:*)&rows=100&start=0
We will want to recommend to the ADC data interns which datasets they should focus on, as they enhance the metadata (add attribute descriptions). So will want to review all the datasets. It would help to do this systematically. A creator tends to put in the same type of datasets, so examining chunks by creator could be a workable strategy. working on that query.
Need to generate the list of carbon-cycle data sets to be annotated. Start with one or more SOLR queries from https://arcticdata.io/catalog, and compile these into a parseable data table with appropriate attributes.