AtlasOfLivingAustralia / data-management

Data management issue tracking
7 stars 0 forks source link

collector name parsing #214

Open M-Nicholls opened 8 years ago

M-Nicholls commented 8 years ago

possible issues with multiple collectors and in an email from a user: Eg Identified by – Here is a subset ALA listing which obviously shows the same person in more than once because of spaces, incorrect spelling or missing initials or first name.

A Stewart (9171) A.M. Buchanan (3758) Allan, M. Atkinson Atkinson, Archd. Atkinson, H.B. B Gee (2276) Brothers, N.P. Buchanan, A.M. Campbell, A.J. Campbell, S. Cleland, J. B. Cleland, J.B. Colin Spry (5692) Collier, P. Collier, P.A. Crawford, I. D.I. Morris (4111) David L Jones (6648) David Reynolds (6769) David Reynolds (6769),G H Reynolds (9257) David Ziegeler (7381) Denis IMorris (6606) Dobson, W.L. Judge E Lazarus/SHarris (6728) Eve Lazarus (6727) Field Naturalists Club of Victoria Fred Duncan (7477) G Thorburn (3284),L Clear (3285),L Matthewson (3286) G Thorburn (7555) Gabriel, J. Garreau, C.K. Gee, B. Gilfedder, L. Griffith, H. Griffiths, H. Griffiths, H.H.D. Gunderson, Consul. H Wapstra (1062),Annie Wapstra (9224) Hans Wapstra (3208) Hans and Annie Wapstra (1091) Hans and Annie Wapstra (6223) Harris, S. Harvey, W. H. J Gabriel (1194) J Milligan (2860) J Whinray (7550) J Whinray (7550),G Gregory (7551) J. Milligan (4659) J. Whinray (4689) J.E. Wapstra & A. Wapstra (6271) J.S. Whinray (4747) Jamie Kirkpatrick (1315) Janes, J. Jaqui TAylor (6053) Jasmine Janes (7094) Jeanes, J.A. John Whinray (1380) K Harris (1415) Karen Ziegler (3104) L Gilfedder (3446),R Glazik (3447) L Rodway (1521) L Wall (1526) Lazarus, E. Louise Gilfedder (1499) Lucas, A. H. S. Lucas, A.H.S. M. Allan (4997) Maclaine, E. Maclaine, E. Mrs Maclaine, J.H. Maclaine, J.J.H. Margaret Allan (1619) Mark Holdsworth (1673) Mark Wapstra (1621) Mary Cameron (3061) Micah Visoiu (5870) Milligan, J. Morris, D.I. Mueck, S. N.P. Brothers (5216) Naomi Lawrence (1799) Olsen, A.M. P Collier (1889) P. Collier (5300) P.A. Tyson (23346) Phil Collier (1923) Potts, W. R B Schahinger (19349) R Glazik (3447) Reynolds, D. Reynolds, D.A. Richard Schahinger (2944) Richard Schahinger (2944),Matthew Larcombe (10101) Rodway, L. Rogers, R.S. S. Collier (5548) Simson, August Spry , C. Spry, C. Stephen Harris (2289) Stephen Harris (2289),Wendy Potts (2507),Eve Lazarus (6727) Steve Summers (3008) Stewart, A. Sutton, C.S. Sutton, Dr T Moule (2346) Taylor, J. Tierney, C.A.; Whinray, J.S. Tyson, P.A. Tyson, R.G. Unknown Unknown (21598) Various authors from forestry (3108) Visoiu, M. Visoiu, M., Brüllhardt, E., Buchanon, A., Perrins, L. Wapstra, A. Wapstra, H. Wapstra, J.E. Wapstra, J.E.; Wapstra, A. Wayne Warren (3509) Whinray, J. Whinray, J. S. Whinray, J.S. Whinray, J.S.; Christie, M.H. Whinray, J.S.; Christie, Maureen H. Whinray, J.S.; Cooper, Jane Whinray, J.W. Whinray,J.; Tiernay, Catherine A.. Willis, J.H. Ziegeler, D. Ziegler, K.

M-Nicholls commented 8 years ago

multiple collectors parsed and loaded as a list so can search on all lists containing a collector

temi commented 8 years ago

@M-Nicholls The reason why multiple collectors are appearing is because biocache is faceting the wrong field - 'collector'. However, the correct field to show is 'collectors'. Changing the config to reference 'collectors' field will resolve the issue. I have corrected the config files in a few places. It is detailed in this commit.

temi commented 8 years ago

https://upsource.ala.org.au/ala-install/review/AI-CR-14