globbestael / DedupEndNote

Deduplication of EndNote RIS files
http://dedupendnote.nl
Apache License 2.0
1 stars 1 forks source link

Patterns for journals: word breaks only and no anchoring? #1

Closed globbestael closed 2 years ago

globbestael commented 2 years ago

The patterns for journals in the DeduplicationService ::compareJournals_... methods originally used alternation "(\b|-|)" and anchored some pattern to the beginning of the string ("^").

Tests

Missed journals in ASySD_SRSR_Human Journal 1 Journal 2
ADHD-ATTENTION DEFICIT AND HYPERACTIVITY DISORDERS Atten Defic Hyperact Disord
ATLA-ALTERNATIVES TO LABORATORY ANIMALS Altern Lab Anim
CTS-CLINICAL AND TRANSLATIONAL SCIENCE Clin Transl Sci
MLTJ-MUSCLES LIGAMENTS AND TENDONS JOURNAL Muscles Ligaments Tendons J

All these cases are Starting initialism AND all caps AND hyphen between initialism and rest. In Record::addJournals journals in all caps are capitalized. In these cases starting initialism gets lost: "Adhd Attention Deficit And Hyperactivity Disorders". Not anchoring the pattern to the beginning of the string lets the pattern for journal 2 match journal 1: "Atten.\bDefic.\bHyperact.\bDisord." matches "Adhd Attention Deficit And Hyperactivity Disorders".

Decision