As a BHL developer I want a shorter version of a dump with filtered data

gnames / bhlindex

BHLindex is used by Biodiversity Heritage Library to create their scientific names index

MIT License

9 stars 1 forks source link

If the output WILL be filtered, then the needed columns are names.csv NameID DetectedName MatchedCanonical MatchedFullName RecordID DataSourceID occurrences.csv NameID PageID If the output will NOT be filtered, then the needed columns are: names.csv NameID DetectedName MatchedCanonical MatchedFullName RecordID DataSourceID MatchSortOrder MatchType OddsLog10 Curation Error occurrences.csv NameID PageID

Filter:

COPY (
SELECT [n.name](http://n.name/), n.matched_name, n.matched_canonical
FROM name_strings n INNER JOIN name_statuses st ON [n.name](http://n.name/) = [st.name](http://st.name/)
WHERE (n.match_type IN ('ExactMatch', 'ExactCanonicalMatch') AND n.curation <> 'Unknown')
OR (n.match_type IN ('FuzzyCanonical', 'FuzzyPartial') AND (st.odds > 1000000 OR n.edit_distance IN (0,1) OR n.stem_edit_distance IN (0,1)))
OR (n.match_type IN ('NoMatch', '') AND st.odds > 1000000)
OR (n.match_type = 'ExactPartialMatch')
) TO STDOUT DELIMITER '|'

gnames / bhlindex

As a BHL developer I want a shorter version of a dump with filtered data #61