VertNet / dwc-indexer

Google App Engine project for indexing DwC text files into Search API Documents
GNU Lesser General Public License v3.0
0 stars 1 forks source link

Remove verbatim record from full text indexing #27

Closed laurarussell closed 9 years ago

laurarussell commented 10 years ago

Relates to issue https://github.com/VertNet/webapp/issues/422. See comment by Laura.

Full text indexing for searches in the portal should only include the darwin core fields that are supplied in datasets. It should not include ancillary meta data that is harvested along with the data.

tucotuco commented 9 years ago

Turns out any term in the index is full-text searchable, so there is no way to keep the full verbatim record for display and not have it indexed as well. So, in 50c06aa1247aa9154fdeddb7cc8b07897f6b94be, removed the spurious truncated record search term, kept verbatim, removed terms such as the metadata description, and added specific key search terms for lots of fields to help filter in a more focussed way. Search keys now include the following: doc_id rank iptrecordid institutioncode collectioncode catalognumber gbifdatasetid gbifpublisherid networks lastindexed license iptlicense migrator dctype basisofrecord type, value=_type(data) continent country stateprovince county municipality island islandgroup waterbody locality geodeticdatum georeferencedby georeferenceverificationstatus kingdom phylum class order family genus specificepithet infraspecificepithet scientificname vernacularname typestatus recordedby recordnumber fieldnumber bed formation group member sex lifestage preparations reproductivecondition media, value=has_media(data) tissue, value=has_tissue(data) fossil, value=is_fossil(data,res_id) hastypestatus, value=has_key(data, 'typestatus') wascaptive, value=was_captive(data) haslicense, value=has_license(data) rank hashid, value=hash(keyname)%1999) verbatim_record GeoField(name='location', value=location) mappable coordinateuncertaintyinmeters year month day eventdate