LibraryCarpentry / week-four-library-carpentry--DEPRECATED

Week Four lesson
http://librarycarpentry.github.io/city-november-2015/
8 stars 6 forks source link

Is there a cluster type function for matching full and abbreviated things? #16

Open pjhatch125 opened 8 years ago

pjhatch125 commented 8 years ago

Hi Owen,

I'm looking looking at some OA data and have some publisher names in full and some abbreviated.

Is there a way of clustering and merging them so all publisher names are in full?

Thanks,

Philippa

ostephens commented 8 years ago

@pjhatch125 the answer is probably 'it depends' :)

You could do a text facet on the Publisher names and edit the abbreviations to the full name in the facet. This would be OK if the numbers are low, but not going to be effective if you have lots of publishers and lots of variation

In some cases the 'Cluster' functionality may help you merge together - but generally abbreviations are so different to the full name this isn't going to be effective (e.g. T&F vs Taylor and Francis)

Because of these challenges the other option I'd consider is finding a mechanism to lookup publisher information from another (external or local) source. There are two approaches here:

Some examples:

It would be interesting to see if any of these prove effective!

pjhatch125 commented 8 years ago

Thank you. Lots of ideas to try!