diging / tethne

Python module for bibliographic network analysis.
http://diging.github.io/tethne/
GNU General Public License v3.0
84 stars 32 forks source link

Issue parsing some WoS exports with Group Authors. #13

Closed Phocion closed 10 years ago

Phocion commented 10 years ago

Greetings all. Have been using Tethne as a means to parse the vast amounts of WoS exports I have for a particular project.

Out of 160 files (each containing the maximum 500 references), ~31 of them fail with the following error (although the actual string values vary):

In [19]: meta_list = rd.wos.convert(wos_list)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-19-c13337f31f8c> in <module>()
----> 1 meta_list = rd.wos.convert(wos_list)

/usr/local/lib/python2.7/dist-packages/tethne/readers/wos.pyc in convert(wos_data)
    592                         #  as our mapping key to ensure consistency with older
    593                         #  datasets.
--> 594                         author_index = wos_dict['AF'].index(author)
    595                         #e.g."WU, ZD"
    596                         author_au = wos_dict['AU'][author_index].upper()

ValueError: 'Brazilian Aging Brain Study Grp' is not in list

For example, the only line "Brazilian Aging Brain Study Grp" appears in the file in question is:

data/ims/arrastra$ grep -H -r "Brazilian Aging Brain Study Grp" /data/arrastra/alzheimers_xph1_v4/Citations/
/data/arrastra/alzheimers_xph1_v4/Citations/sub1/savedrecs (12).txt.perror:CA Brazilian Aging Brain Study Grp
/data/arrastra/alzheimers_xph1_v4/Citations/sub1/savedrecs (12).txt.perror:   [Grinberg, L. T.; Alba, J. G.; Farfel, J. M.; Suemoto, C. K.; de Lucena Ferretti, R. E.; Leite, R. E. P.; de Andrade, M. P.; Pasqualucci, C. A.; Nitrini, R.; Jacob-Filho, W.; Brazilian Aging Brain Study Grp] Univ Sao Paulo, Sch Med, Brazilian Aging Brain Study Grp LIM22, Sao Paulo, Brazil.

I could easily add a check to ensure that the value is in fact present in the list, but that wouldn't really be solving the issue. Perhaps this is failing due to the fact that it is listed as a "Group Author" (WoS Field = CA)?

Thanks!

erickpeirson commented 10 years ago

Hi there, @Phocion; glad to hear that you're finding Tethne useful! I think that I see where the hiccup is occurring. Any chance you could send along some sample data? (send to erick.peirson@asu.edu)

Thanks!

Phocion commented 10 years ago

Thanks for the quick reply! Just sent an email as requested under my Drexel U. account.

erickpeirson commented 10 years ago

Ok, see whether that works. The group author ('CA') field is now treated like a regular author in readers.wos.convert. The issue was that values from CA are included in the author address field ('C1'), and so the bit of .convert() that does author-institution mapping got confused.

In the future, we should think harder about what else we might want to do with the CA field.

erickpeirson commented 10 years ago

Oops, closed automatically. Feel free to close if this resolves the problem.

Phocion commented 10 years ago

Thanks for the quick response! I'll test and confirm within the hour. One thing to consider about the CA field - at least specifically in the way I'm thinking - is that it's used to cite a body of contributors from where the analyzed data originated. Many papers include "and the Alzheimer's Disease Neuroimaging Initiative" at the end of the author's list if they used the ADNI dataset in their analysis. It's a specific way of citing this effort. Just a thought to consider.

Phocion commented 10 years ago

Looks great! Issue resolved. Thanks again for the quick turn-around!