Closed vlraymond closed 4 years ago
from Chris:
After our quick walkthrough of Solr yesterday, I wanted to point out the official documentation in case you need to look up the details: http://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-5.2.pdf . This guide is for Solr server admins (installation, setup, etc), but starting on page 241 there is a section called Query Syntax and Parsing
that has all the gory details for parameters and values you can use. We basically discussed the q=
parameter from the Standard Query Parser
discussed on page 247. Also, Dave Vieglais put together a query tool to help learn the syntax: https://examples.dataone.org/querycn.html - this one queries the CN, but maybe it’s helpful.
Hi @vlraymond : A quick note for your writeup: the Solr syntax uses a colon to delimit field names and values in the query string, and I think in the beginning of your notes you wrote , = "field" , "value"
with an example of arcticdata.io/metacat/d1/mn/v2/query/solr/?q=*,*
(which should be q=*:*
). The rest of the notes use the colon though. Cheers.
Note: In-house training/learning (for VR & KM)
I want to be able batch search multiple NSF award #s to see if the data from existing awards is in ADC
From Jeannette:
Check out the intro to Solr chapter in our training manual: https://nceas.github.io/datateam-training/introduction-to-solr.html
From Chris
In browser:
Background:
Preliminary Queries:
To display capabilities of the member node including Core capabilities, storage capabilities, replication capabilities. MN read includes Query API
arcticdata.io/metacat/d1/mn/v2/node
Use this to see what query capabilities there are
arcticdata.io/metacat/d1/mn/v2/query
To list all fields and descriptions for ADC
arcticdata.io/metacat/d1/mn/v2/query/solr
Query strings, what do the bits and pieces mean
? = sending parameters or value pairs
: = "field":"value"
fl = return fields
rows = number of returns wt = output format +-obsoletedBy: = remove obsolete versions beginDate:[YYY-MM-DDT"HR:MN"Z%20TO%20YYYY-MM-DDT"HR:MN %20 = url encoded space origin = creator of metadata / dataset originator = investigator or organization name "*" = all "+" = space
Examples
Query all fields, all values
arcticdata.io/metacat/d1/mn/v2/query/solr/?q=*,*
Query title field that contains word "soil"
arcticdata.io/metacat/d1/mn/v2/query/solr/?q=title:*soil*
Query title field that contains word "soil" with only title and ID returned
arcticdata.io/metacat/d1/mn/v2/query/solr/?q=title:*soil*&fl=id,title
Query title field that contains word "soil" with only title returned, 100 rows, in .json format
arcticdata.io/metacat/d1/mn/v2/query/solr/?q=title:*soil*&fl=title&rows=100&wt=json
Command line: curl to a web server, query below will return only origin as .csv
curl "https://arcticdata.io/metacat/d1/mn/v2/query/solr/?q=origin:*&fl=origin&wt=csv"
curl to web server, query below will return 5000 origin names as .csv
curl "https://arcticdata.io/metacat/d1/mn/v2/query/solr/?q=origin:*&fl=origin&rows=5000&wt=csv"
curl to web server, query below will return 5000 origin names as .csv for "Grebmeier"
curl "https://arcticdata.io/metacat/d1/mn/v2/query/solr/?q=origin:*&fl=origin&rows=5000&wt=csv" | grep Grebmeier
curl to webserver, query below will count the lines for "Grebmeier"
curl "https://arcticdata.io/metacat/d1/mn/v2/query/solr/?q=origin:*&fl=origin&rows=5000&wt=csv" | grep Grebmeier | wc -l
curl to server, query below will sort and return unique "origin" items under Grebmeier, and count the them
curl "https://arcticdata.io/metacat/d1/mn/v2/query/solr/?q=origin:*&fl=origin&rows=5000&wt=csv" | grep Grebmeier | sort | uniq | wc
query below will return all unique "origin" items in arctcdata.io
curl "https://arcticdata.io/metacat/d1/mn/v2/query/solr/?q=origin:*&fl=origin&rows=5000&wt=csv" | sort | uniq
query for ID and return identifier on the document
curl "https://arcticdata.io/metacat/d1/mn/v2/query/solr/?q=id:*&fl=id&rows=5000&wt=csv" | sort | uniq
How to pull a list of award numbers in ADC:
Gameplan:
Getting set up
brew install xmlstarlet
Test xml starlet with one record:
curl "https://arcticdata.io/metacat/d1/mn/v2/object/doi:10.18739/A2MS0P"
To tidy up outputs use the "fo" command:
curl "https://arcticdata.io/metacat/d1/mn/v2/object/doi:10.18739/A2MS0P" | xmlstarlet fo
To drill down to funding, tell xmlstarlet what part of EML to look in:
curl -s "https://arcticdata.io/metacat/d1/mn/v2/object/doi:10.18739/A2MS0P" | xmlstarlet sel -t -v "/eml:eml/dataset/project/funding/para" -n
But wait there's more
(NB: this is for publicly available items)
identifiers=$(curl -s "${mn_url}${query_endpoint}/${solr_query}");
let count=0;
For each identifier, download the EML and process it
for identifier in $identifiers; do
Skip the first line
if [[ "$identifier" == "id" ]]; then continue; fi count=$(( count + 1 )); echo "${count}) ${identifier}";
Call the DataONE MNRead.get() call to grab the EML
xml=$(curl -s "${mn_url}${object_endpoint}/${identifier}"); xml=$(xmlstarlet fo <<< ${xml});
echo "${xml}";
Use xmlstarlet to find the elements in each EML document to get the award number
award_numbers=$(xmlstarlet sel -n -t -v "/eml:eml/dataset/project/funding/para" <<< ${xml}); if [[ "$award_numbers" != "" ]]; then printf "%s\n" "$award_numbers"; else printf "%s\n" "No funding element found"; fi done