dalejn / cleanBib

Probabilistically assign gender and race proportions of first/last authors pairs in bibliography entries
MIT License
149 stars 31 forks source link

duplicate entries #43

Closed micaela349 closed 1 year ago

micaela349 commented 1 year ago

Hello, I am contacting you because when I tried to run the code I encountered 2 small problems: -first of all, in my reference .bib file, I had two articles from the same author and the same year and the bug said to delete the duplicate or replicate the ID key (I didn't know how to replicate the ID key so I just deleted one of the article, but I would want to know how to solve this issue without removing one of the article from the list). Here are the two articles in question: @article{Motl2015, abstract = {Background There is little known about cardiorespiratory fitness and its association with volumes of the thalamus, hippocampus, and basal ganglia in multiple sclerosis (MS). Such inquiry is important for identifying a possible behavioral approach (e.g., aerobic exercise training) that might change volumes of deep gray matter (DGM) structures associated with cognitive and motor functions in MS. Purpose This study examined the association between cardiorespiratory fitness and volumes of the thalamus, hippocampus, and basal ganglia in MS. Method We enrolled 35 persons with MS who underwent a maximal exercise test for measuring cardiorespiratory fitness as peak oxygen consumption (VO2peak) and brain MRI. Volumes of the thalamus, hippocampus, caudate, putamen, and pallidum were calculated from 3D T1-weighted structural brain images. We examined associations using partial (pr) correlations controlling for demographic and clinical variables. Results VO2peak was significantly associated with composite scaled volumes of the caudate(pr =.47, p <.01), putamen (pr =.44, p <.05), pallidum (pr =.40, p <.05), and hippocampus (pr =.42, p <.05), but not thalamus (pr =.31, p =.09), when controlling for sex, age, disability, and duration of MS. Conclusion Our results provide novel evidence that cardiorespiratory fitness is associated with volumes of DGM structures that are involved in motor and cognitive functions in MS.}, author = {Robert W. Motl and Lara A. Pilutti and Elizabeth A. Hubbard and Nathan C. Wetter and Jacob J. Sosnoff and Bradley P. Sutton}, doi = {10.1016/j.nicl.2015.02.017}, issn = {22131582}, issue = {March}, journal = {NeuroImage: Clinical}, keywords = {Brain,Exercise,MRI,Multiple sclerosis,Physical activity}, pages = {661-666}, pmid = {25844320}, title = {Cardiorespiratory fitness and its association with thalamic, hippocampal, and basal ganglia volumes in multiple sclerosis}, volume = {7}, year = {2015}, } @article{Motl2015, abstract = {Background: Persons with multiple sclerosis (MS) engage in substantially less overall physical activity than healthy controls, but there is little information on public health rates of physical activity necessary for health benefits. Purpose: This study examined the rates of insufficient, moderate, and sufficient physical activity in persons with MS compared with healthy controls. Method: Secondary analysis of data from participants with MS (n = 1521) and healthy controls (n = 162) who completed the Godin Leisure-Time Exercise Questionnaire (GLTEQ) as part of a questionnaire battery in 14 previous investigations. Results: There were statistically significant differences in overall GLTEQ scores (F1,1666 = 96.8, P < 0.001, d = 0.83) and rates of physical activity (χ2 (2, N = 1683) = 94.2, P < 0.001) between MS and control groups. The rates of insufficient, moderate, and sufficient physical activity in the MS group were 58.0%, 15.2%, and 26.8%, respectively. Those with MS were 2.5 times more likely to report insufficient physical activity and 2.3 times less likely to report sufficient physical activity than controls. Conclusion: The majority of persons with MS were insufficiently physically active, and this segment represents the largest opportunity for successful behavior change and accumulation of associated health benefits.}, author = {R. W. Motl and E. Mcauley and B. M. Sandroff and E. A. Hubbard}, doi = {10.1111/ANE.12352}, issn = {1600-0404}, issue = {6}, journal = {Acta neurologica Scandinavica}, keywords = {Adolescent,Adult,Case-Control Studies,E A Hubbard,E McAuley,Exercise,Extramural,Female,Humans,MEDLINE,Male,Middle Aged,Motor Activity,Multiple Sclerosis / epidemiology,N.I.H.,NCBI,NIH,NLM,National Center for Biotechnology Information,National Institutes of Health,National Library of Medicine,Non-U.S. Gov't,PubMed Abstract,R W Motl,Research Support,doi:10.1111/ane.12352,pmid:25598210}, month = {6}, pages = {422-425}, pmid = {25598210}, publisher = {Acta Neurol Scand}, title = {Descriptive epidemiology of physical activity rates in multiple sclerosis}, volume = {131}, url = {https://pubmed.ncbi.nlm.nih.gov/25598210/}, year = {2015}, }

-second of all, same of the authors last name such as Veldhuijzen Van Zanten (Jet J.C.S. Veldhuijzen Van Zanten) were considered as a bug, so I had to rewrite them all attached as VeldhuijzenVanZanten in order to remove the bug. Is this the correct way to do without having an impact on the final file?

Thank you for your help in advance! If you need more information/clarifications, just let me know.

dalejn commented 1 year ago

Hi, thanks for checking out the tool! For the first question, you'd want to change one of the "Motl2015" entries to another name. Right now the names for both are "Motl2015" and so you could, for instance, change the first entry to:

@Article{Motl2015_Cardiorespiratory,
abstract = {Background There is little known about cardiorespiratory fitness and its association with volumes of the thalamus, hippocampus, and basal ganglia in multiple sclerosis (MS). Such inquiry is important for identifying a possible behavioral approach (e.g., aerobic exercise training) that might change volumes of deep gray matter (DGM) structures associated with cognitive and motor functions in MS. Purpose This study examined the association between cardiorespiratory fitness and volumes of the thalamus, hippocampus, and basal ganglia in MS. Method We enrolled 35 persons with MS who underwent a maximal exercise test for measuring cardiorespiratory fitness as peak oxygen consumption (VO2peak) and brain MRI. Volumes of the thalamus, hippocampus, caudate, putamen, and pallidum were calculated from 3D T1-weighted structural brain images. We examined associations using partial (pr) correlations controlling for demographic and clinical variables. Results VO2peak was significantly associated with composite scaled volumes of the caudate(pr =.47, p <.01), putamen (pr =.44, p <.05), pallidum (pr =.40, p <.05), and hippocampus (pr =.42, p <.05), but not thalamus (pr =.31, p =.09), when controlling for sex, age, disability, and duration of MS. Conclusion Our results provide novel evidence that cardiorespiratory fitness is associated with volumes of DGM structures that are involved in motor and cognitive functions in MS.},
author = {Robert W. Motl and Lara A. Pilutti and Elizabeth A. Hubbard and Nathan C. Wetter and Jacob J. Sosnoff and Bradley P. Sutton},
doi = {10.1016/j.nicl.2015.02.017},
issn = {22131582},
issue = {March},
journal = {NeuroImage: Clinical},
keywords = {Brain,Exercise,MRI,Multiple sclerosis,Physical activity},
pages = {661-666},
pmid = {25844320},
title = {Cardiorespiratory fitness and its association with thalamic, hippocampal, and basal ganglia volumes in multiple sclerosis},
volume = {7},
year = {2015},
}

After saving this change, the parser will recognize each as unique entries.

Re: the second question thanks for bringing this case to my attention! I believe we coded edge cases to work for surnames with hyphens or two words and other special characters--could you try it hyphenated as "Veldhuijzen-Van-Zanten" or truncated to "Van Zanten?" In the mean time, I'll look into making the code less stringent about the number of words it expects for the surname in our next update. Sorry about that!

micaela349 commented 1 year ago

Hello,

Both solutions worked for me so I thank you very much for your help!!