dhimmel / delays

Trends in scientific publishing delays
http://blog.dhimmel.com/history-of-delays/
Creative Commons Attribution 4.0 International
11 stars 3 forks source link

Outdated or buggy journal abbreviations data #2

Open ravwojdyla opened 3 years ago

ravwojdyla commented 3 years ago

Not sure if the data is outdated or if there is a bug but some journals have outdated/invalid(?) iso abbreviations. Example from pubmed/J_Medline.txt:

JournalTitle: The New England journal of medicine
MedAbbr: N Engl J Med
ISSN (Print): 0028-4793
ISSN (Online): 1533-4406
IsoAbbr: N Engl J Med
NlmId: 0255562

Notice the Iso and Med abbreviations (are the same), but in dhimmel/delays, they are different: N. Engl. J. Med. (Iso) vs N Engl J Med (Med) (notice the dots).

ravwojdyla commented 3 years ago

That said, for that journal wikipedia says the ISO is N. Engl. J. Med. (with dots). Would that mean that NLM's pubmed/J_Medline.txt is invalid 🤷? NLM entry is here.

dhimmel commented 3 years ago

The abbreviation used by PubMed is the "NLM Title Abbreviation", which I believe is the same as MedAbbr. So in PubMed, the journal is displayed as "N Engl J Med" as seen in this search result:

image

Looking at the online NLM journal record at https://www.ncbi.nlm.nih.gov/nlmcatalog/255562, it doesn't appear to list a field for the ISO abbreviation.

So based on your comment, it seems that the NLM catalog via J_Medline.txt used to have the proper abbreviations in IsoAbbr but currently does not (because it is missing the periods)?

ravwojdyla commented 3 years ago

So based on your comment, it seems that the NLM catalog via J_Medline.txt used to have the proper abbreviations in IsoAbbr but currently does not (because it is missing the periods)?

@dhimmel not saying NLM used to have "proper abbreviations" (I don't know that), not sure which records have changed, just observing in this issue that in this repo's data the ISO abbreviations do have dots, but currently available records in NLM don't. Whether NLM's records are valid, is a separate question. I haven't researched which ISO abbreviation is "correct" :)

dhimmel commented 3 years ago

I updated the NLM catalog export in https://github.com/dhimmel/delays/commit/83577d4bb774bb90533d2cfe0db7032b70fdbbc1. I looked and IsoAbbr is now always the same as MedAbbr. So seems like these two columns used to refer to different abbreviations, but have been rectified to be the same. Now I am not sure whether the version with or without the periods is the one that follows the actual ISO standard.

I also updated the downstream scopus metrics in https://github.com/dhimmel/scopus/commit/1c2f8aa0eb4738ced9923a773020427441aa521c.