dhimmel / plostime

Publication delays at PLOS and 3,475 other journals
http://blog.dhimmel.com/plos-and-publishing-delays/
Creative Commons Attribution 4.0 International
19 stars 6 forks source link

Overall averages for publication times #1

Open StuartCT opened 6 years ago

StuartCT commented 6 years ago

I am interested in finding a mean (or median) of time to acceptance and time to publication for all the journals in the set. Your published analyses only show these for specific journals.

I have tried analysing delays.tsv but it's too large to load into Excel. Are you able to give me these figures?

Thanks

dhimmel commented 6 years ago

Your published analyses only show these for specific journals.

The table in the "Publication delays at PLOS and 3,475 other journals" blog post contains median time to acceptance and time to publication for 3,482 journals based on articles published from 2014-01-01 to 2015-06-29. Are you saying that there are journals not in this table? There are many missing journals that either weren't indexed in PubMed or didn't deposit article history dates to PubMed.

Note that there is a more up to date repository than dhimmel/plostime regarding journal delays at dhimmel/delays. dhimmel/delays contains the source code for the more recent blog post titled "The history of publishing delays". This repository contains a file journal-summaries.tsv, with mean & median times for 8840 journals based on articles published through the end of 2015. This file is small enough, you should be able to open it in Excel. However, note that comparisons between journals may not be super meaningful since they could include articles from different periods. Also make sure to take averages for journals with few articles with deposited dates (n_articles) with grains of salts.

StuartCT commented 6 years ago

Hi. Sorry, I don't think I was clear. When I say "all the journals in the set" I am meaning I want an aggregate value of the mean times for all journals. Your table breaks them down by journal.

Maybe I can try analysing journal-summaries.tsv. Is there some way to download this data (clicking on the link only displays it to the screen)

dhimmel commented 6 years ago

I want an aggregate value of the mean times for all journals.

Okay here's some Python code to get the mean acceptance and publication times across all articles:

import pandas
url = "https://github.com/dhimmel/delays/raw/2d05dbaf2d8eaf50c35533261ba4c29b70c350a8/data/delays.tsv.gz"
delay_df = pandas.read_table(url, low_memory=False)
delay_df.groupby('delay_type').delay.mean()

I've copied the output below:

delay_type
Acceptance     123.840429
Publication     43.130996
Name: delay, dtype: float64

So the mean submission to acceptance time was 123.8 days. And 43.1 days for submission acceptance to publication. However, this average includes many old articles. Depending on your use case, those may or may not be relevant.

Is there some way to download this data (clicking on the link only displays it to the screen

Right click the "Raw" button and select "Save link as" (or similar option). Also if you enter the link in the code snippet above in your browser, it will probably download the file.

StuartCT commented 6 years ago

thanks.

But I am not sure how the submission to acceptance time can be longer than the submission to publication time (since the latter includes the former)?

dhimmel commented 6 years ago

But I am not sure how the submission to acceptance time can be longer than the submission to publication time (since the latter includes the former)?

My bad. 43.1 days refers to mean time for acceptance to publication. Corrected in my comment above.

StuartCT commented 6 years ago

ah. got you. Thanks so much for this. Very useful to have this data.

StuartCT commented 6 years ago

Hi. Managed to download journal-summaries.tsv and make an Excel file. I am a bit puzzled by some of the data relating to our journals, though.

For example, our biggest journal Proc. Biol. Sci. only has 4 papers in your dataset. And the mean acceptance delay (48.5) doesn't agree with the table in your blog post here which gives 133.5.

Also, Proc. Math. Phys. Eng. Sci. has a different number of articles in the acceptance and publication fields. How is that?

dhimmel commented 6 years ago

your blog post here which gives 133.5

The blog post Publication delays at PLOS and 3,475 other journals includes papers from a shorter timespan. Proc Biol Sci only has two articles with data, as per that table. journal-summaries.tsv is more recent, which could be why it has slightly more articles. Anyways given the extremely low proportion of articles with time information for this journal, these averages should not be considered meaningful.

For example, our biggest journal Proc. Biol. Sci. only has 4 papers in your dataset.

Publishers deposit the timestamp information to PubMed, which is optional. It may be worth checking with your tech team, to see if they can more comprehensively submit history timestamps to PubMed. It should even be possible to retroactively update the timestamps for existing PubMed records. I don't think publishers are under any obligation to submit detailed PubMed metadata, but it definitely helps the community by enabling analyses like these!

Also, Proc. Math. Phys. Eng. Sci. has a different number of articles in the acceptance and publication fields. How is that?

There are three relevant timestamps: submission (receival), acceptance, and publication. Articles may have some but not all of these timestamps. For example, if an article had an acceptance and publication timestamp, but not a submission timestamp, we would be unable to calculate an "acceptance delay" but could still calculate the "publication delay" (as per the implementation in dhimmel/delays).

StuartCT commented 6 years ago

Hi Daniel. I am just looking again at your data and I have two questions: 1) have you done any more on this project since we last discussed? Any new data, analyses? 2) have you any data on receipt to first decision time? We measure this period, rather than receipt to acceptance, as the latter includes delays caused by authors (over which we have little control!). So I think Rec to First decision gives a fairer indicator of the publisher's performance.

dhimmel commented 5 years ago

have you done any more on this project since we last discussed? Any new data, analyses?

Nothing new besides the blog post The history of publishing delays from February 10, 2016.

Possibly of interest is PubMedLagR by @quantixed, which is an R library for retrieving / visualizing publication times (blog post).

have you any data on receipt to first decision time?

I agree that receipt to first decision is an interesting metric... and could more directly assess a journal's review speed compared to receipt to acceptance. Unfortunately, I don't think this data is widely available in PubMed. I'll let you know if I see a resource with this information.

StuartCT commented 5 years ago

Possibly of interest is PubMedLagR by @quantixed, which is an R library for retrieving / visualizing publication times (blog post).

I have no idea how to run this (not being a programmer), but the blog post is interesting.

I agree that receipt to first decision is an interesting metric... and could more directly assess a journal's review speed compared to receipt to acceptance. Unfortunately, I don't think this data is widely available in PubMed. I'll let you know if I see a resource with this information.

Thanks