VertNet / post-harvest-processor

Data processor to take gulo-harvested data, prepare it for VertNet indexing and BigQuery
2 stars 0 forks source link

False positive tissues #8

Open tucotuco opened 7 years ago

tucotuco commented 7 years ago

Records are coming up when filtering for tissues that are whole animal (ethanol) with no tissue - e.g., http://portal.vertnet.org/o/mvz/amphibian-and-reptile-specimens?id=http-arctos-database-museum-guid-mvz-herp-1024-seid-1487713 here's the query: specificepithet:cerastes genus:crotalus hastissue:1

Check this before re-indexing MVZ Herps. Check that this has already been addressed before any new complete reindexing.

Relevant tissue tokens at https://github.com/VertNet/post-harvest-processor/blob/master/lib/vn_utils.py#L1134

ccicero commented 7 years ago

Issue appears to affect records in alcohol or ethanol generally, e.g.,

specificepithet:stelleri genus:cyanocitta county:"el dorado" hastissue:1

records include 'skin in alcohol' from 1923, also a false positive *http://portal.vertnet.org/o/ucla/birds?id=urn-catalog-ucla-birds-13140**

dbloom commented 7 years ago

Didn't we remove terms such as "alcohol" and "ethanol" from our back end so these records wouldn't pop up? Is it just a matter of a refresh of the MVZ data to solve this?

On Tue, May 9, 2017 at 8:48 AM, Carla Cicero notifications@github.com wrote:

Issue appears to affect records in alcohol or ethanol generally, e.g.,

specificepithet:stelleri genus:cyanocitta county:"el dorado" hastissue:1

records include 'skin in alcohol' from 1923, also a false positive *http://portal.vertnet.org/o/ucla/birds?id=urn-catalog-ucla-birds-13140 http://portal.vertnet.org/o/ucla/birds?id=urn-catalog-ucla-birds-13140**

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/VertNet/post-harvest-processor/issues/8#issuecomment-300207730, or mute the thread https://github.com/notifications/unsubscribe-auth/AA5hbyphYZqTl9k9TZmmU0wSSreIqgeVks5r4IrPgaJpZM4NVgU_ .

tucotuco commented 6 years ago

This was indeed addressed in commit 050e1aa83b562061af23404d4454213dbca42b9e. Problem will disappear with reharvesting. SDNHM Herps also reported the problem. Affected the following 92 resources:

SELECT icode, collectioncode, count(*) as reps FROM [vertnet-portal:dumps.vertnet_latest] WHERE lower(preparations) like '%alc%' AND lower(preparations) not like '%tiss%' AND hastissue='1' GROUP BY icode, collectioncode

icode,collectioncode,reps TTU,Mammals,21608 UA,UAIC,22 ZMUC,Squamata,12897 MZLU,Reptilia,1540 UWFC,EGG COLLECTION,1 WNMU,Fish specimens,379 BPBM,VZ-BBM,108 ZMUC,Testudines,3 TCWC,Ichthyology,48139 LSUMZ,Fishes,14191 BPBM,Herp-BPBM,43073 BPBM,VZ-BBM-SA,32 BPBM,VZ-BBM-TA,62 BPBM,VZ-BBM-PI,1 NYSM,ZM,651 KU,Birds,5052 CM,VertPaleo,1 UAM,Fish specimens,631 BPBM,VZ-BBM-TH,47 UCLA,Birds,40158 MZLU,Evertebrata,114 NMMNH,Mammals,18 TCWC,Mammals,16756 NMR,Chordata,20 ZMUC,Caudata,565 MZLU,Amphibia,1662 CHAS,Herpetology,20945 UAFMC,Mammals,778 USAC,Mamíferos,1 CM,Birds,4503 UCMP,V,1 ASNHC,Reptiles,14060 ZMUC,Anura,2200 ZMUC,Gymnophiona,8 UAFMC,Herps,3010 TCWC,Birds,13326 BPBM,VZ-BBM-X,925 BPBM,VZ-BM-NG,37 BPBM,VZ-BBM-KO,1 BPBM,VZ-MISSING,1 MSB,Mammal specimens,14289 TNHC,Herpetology,87773 UCONN,Mammals,224 UBCBBM,CTC,1887 DMNS,Mammal specimens,180 UNR,Mammals,115 UWFC,JUVENILE COLLECTION,1660 UAFMC,Birds,1108 BPBM,VZ-BPBM-NONE,13 ANSP,HRP,4 WNMU,Mammal specimens,11 UAMZ,UAMZ,6049 MCNB,MCNB-Chord,1 BPBM,VZ-BBM-NG,3742 UF,Birds,13 DMNS,Bird specimens,1 LSUMZ,Birds,7261 UWBM,Bird,2 FHSM,HERP,9186 FHSM,FHSM-M,3 UAZ,Mammals,2304 TCWC,Herpetology,91147 BPBM,VZ-BBM-BSIP,365 OMNH,Mammals,1 ANSP,MAM,2159 YPM,VZ,5 ZMUC,AMP,557 UAFMC,Fish,5785 MZLU,Aves,1066 MZLU,Pisces,1314 MCZ,Ich,7 UWFC,ADULT COLLECTION,2 UAZ,Ornithology,5 BPBM,VZ-BPBM,945 BPBM,VZ-BBM-LA,249 BPBM,VZ-BBM-ISA,89 BPBM,VZ-BBN-NG,34 BPBM,Herp-IND,10 SDNHM,Herps,71858 AMNH,Mammals,1 MZLU,Mammalia,15400 AMNH,Birds,14 NYSM,ZO,185 BPBM,VZ-BBM-NP,77 BPBM,Herp-BBM,4 BYU,Main,28 ASNHC,Mammals,1028 MZLU,Typer,4 UWFC,LARVAL COLLECTION,49425 BPBM,VZ-BBM-HK,4 UCLA,Mammals,1826 UF-Archaeology,Parnell Feature1,3