IQSS / dataverse-installations

code that powers a map of Dataverse installations around the world
https://iqss.github.io/dataverse-installations
Apache License 2.0
6 stars 10 forks source link

launch year for each Dataverse installation #7

Open pdurbin opened 5 years ago

pdurbin commented 5 years ago

When I see graphs like the following from http://slides.com/mercecrosas/dataversecommunity2018#/5 ...

Screen Shot 2019-07-11 at 10 42 05 PM

... I think, "This is a great graph but it would be nice to have the actual launch date of each of the Dataverse installations. We could add a column for this in the database.

pdurbin commented 4 years ago

In IRC today I announced a new spreadsheet I'm calling "Crowdsourced information about Dataverse Installations": https://docs.google.com/spreadsheets/d/1bfsw7gnHlHerLXuk7YprUT68liHfcaMxs1rFciA-mEo/edit?usp=sharing

The first question I asked was this, "please add your launch_year, the year you started with Dataverse (fresh or migration from some other system)".

Here's the conversation: http://irclog.iq.harvard.edu/dataverse/2019-10-02#i_107856

pdurbin commented 4 years ago

On Friday @shlake and I talked about different approaches for getting this data:

@shlake do you have a preference? Am I forgetting any other approach we talked about?

All, are there more approaches we haven't considered? @poikilotherm is starting to hack on the map. 😄

shlake commented 4 years ago

@pdurbin I'm going to pester the list (just once - not keep doing it) AND will send targeted emails to folks we know.

Once we figure out ALL the bits of info we need, then I think we can declare a census to get what we don't have.

pdurbin commented 4 years ago

I just made a couple pull requests to help try to make it more apparent that some launch years are missing.

Pull request #39 - add launch year to table

70338886-b9e1fb80-181b-11ea-9430-8382a83c569d

Pull request #40 - add script to convert data.json to a TSV file

70345238-8e660d80-1829-11ea-8e5d-7e70066847cc 70345237-8dcd7700-1829-11ea-9861-efeaa6b8d431

pdurbin commented 4 years ago

I'm going to pester the list (just once - not keep doing it) AND will send targeted emails to folks we know.

@shlake do you still want to pester the list once or should I re-pester? 😄 I just made the following diagram which I was planning to sent to the list. Please let me know! It's a screenshot from https://dataverse.org/installations with some angry red added for the years we are missing. 😄

As you know the fix it to ask everyone to fill in the year in the "crowdsourced spreadsheet" spreadsheet: at https://docs.google.com/spreadsheets/d/1bfsw7gnHlHerLXuk7YprUT68liHfcaMxs1rFciA-mEo/edit#gid=0

Then we run python3 update-data.py and make pull requests.

I'm motivated because yesterday my friend and officemate @erikbuunk started making some killer data visualizations about the Dataverse community (that I can't wait to share!). 🎉

Oh, my other thought is that we can @ mention people here know who could track down the data. People like @eugene-barsky @skasberger @umuchlish @Venki18 @dheles @sjaefulafandi @lmaylein @adam3smith @CCMumma @kaitlinnewson @jmjamison and others that aren't top of mind.

Dataverse_Installations_Around_the_World_The_Dataverse_Project_-_Dataverse org_-_2020-02-27_05 52 12

pdurbin commented 4 years ago

This just in from @skasberger for AUSSDA Dataverse - " you once asked me, when we launched. it was the 20th of august 2017."

CCMumma commented 4 years ago

TDR was launched in November 2016 - sorry I missed this one earlier.

shlake commented 4 years ago

@pdurbin just sent an email to the list asking for dates.

pdurbin commented 4 years ago

@shlake THANK YOU! You're the best! 🎉 🎉 🎉

For anyone who missed it, here's a link to that email: https://groups.google.com/d/msg/dataverse-community/jqfxdU3e2FQ/qroo3eMvAAAJ

pdurbin commented 2 years ago

Thank you to @IlariaBelvedere for taking an interest in this issue, creating some sub-issues, and reviving the thread at https://groups.google.com/d/msg/dataverse-community/jqfxdU3e2FQ/qroo3eMvAAAJ !

By the way, here's a one liner for getting the counts per year:

cat data/data.json | jq '.installations[].launch_year' -r | sort | uniq -c

   1 2005
   1 2012
   1 2013
   1 2014
   2 2015
   4 2016
   5 2017
   1 2018
   9 2019
  13 2020
   8 2021
   3 2022
  30 null

Here's a quick chart using Google Spreadsheets:

chart(1)

adam3smith commented 2 years ago

I don't think the spreadsheet is still editable? I can comment only, which isn't super helpful for spreadsheets. QDR launched its Dataverse catalog in 2018

pdurbin commented 2 years ago

@adam3smith I get paranoid about just anyone editing that spreadsheet. Please click "request access" and I'd be happy to let you in!


Update: I added QDR in c5473ac

pdurbin commented 2 years ago

In 42471d2 I just added a simple ASCII chart of installations per year at https://iqss.github.io/dataverse-installations/year.html

It looks like this:

Dataverse Installations by Year
2005 * 1
2012 * 1
2013 * 1
2014 * 1
2015 ** 2
2016 **** 4
2017 ***** 5
2018 * 1
2019 ********* 9
2020 ************* 13
2021 ******** 8
2022 *** 3
???? ****************************** 30

Below the chart I'm listing the installations that we don't have a launch year for. Note that some of these have GitHub Issues already:

Installations with Unknown Launch Year
Botswana Harvard Data
CIMMYT Research Data
Data INRAe
Data Suds
DataSpace@HKUST
DR-NTU (Data)
Fudan University
Göttingen Research Online
Harvard Dataverse
HeiDATA
IBICT
ICRISAT
ICWSM
Ifsttar Dataverse
International Potato Center
Johns Hopkins University
LIPI Dataverse
Maine Dataverse Network
MELDATA
NIE Data Repository
Peking University
Pontificia Universidad Católica del Perú
QDR Main Collection
Repositorio de Datos de Investigación Universidad del Rosario
Repositório de Dados de Pesquisa da UFABC
Repositório de Dados de Pesquisa do ILEEL
UCLA Dataverse
University of Manitoba Dataverse
Università degli Studi di Milano
UWI
pdurbin commented 2 years ago

Thank you @IlariaBelvedere for all the research! Here's how https://iqss.github.io/dataverse-installations/year.html looks now!

Dataverse Installations by Year
2005 * 1
2008 * 1
2012 * 1
2013 * 1
2014 *** 3
2015 *** 3
2016 **** 4
2017 ******** 8
2018 ****** 6
2019 ******************** 20
2020 ************* 13
2021 ******** 8
2022 *** 3
???? ******* 7

Installations with Unknown Launch Year
CIMMYT Research Data
DataSpace@HKUST
Fudan University
Göttingen Research Online
ICRISAT
Johns Hopkins University
Maine Dataverse Network

Here's the most recent research:

Thanks again, @IlariaBelvedere!

pdurbin commented 2 years ago

Ok! I just did some research and added launch years for the last few installations. Here's how https://iqss.github.io/dataverse-installations/year.html looks now:

Dataverse Installations by Year
2005 * 1
2008 * 1
2012 ** 2
2013 ** 2
2014 **** 4
2015 *** 3
2016 ******* 7
2017 ******** 8
2018 ****** 6
2019 ********************* 21
2020 ************* 13
2021 ******** 8
2022 *** 3

2019 was a big year for us! 😄

I guess next we should think about what the definition of done is for this issue. 😄

pdurbin commented 2 years ago

Here's a quick graph from Google Spreadsheets showing the cumulative number or running total of installations.

Screen Shot 2022-06-04 at 9 53 14 PM

The formula is from https://www.statology.org/google-sheets-cumulative-percentage/

This is how I get the year and count:

cat data/data.json | jq '.installations[].launch_year' -r | sort | uniq -c | awk '{print $1, $2}' | tr " " "\t"

IlariaBelvedere commented 2 years ago

Hello, thank you for the graph? Could I use it in my thesis (with attribution)?

pdurbin commented 2 years ago

@IlariaBelvedere absolutely! Please go ahead. And thanks again for helping so much!

I actually added a second ASCII art graph (over time) last night to https://iqss.github.io/dataverse-installations/year.html

Here's how they look as of this writing:

Dataverse Installations by Year
2005 * 1
2008 * 1
2012 ** 2
2013 ** 2
2014 **** 4
2015 *** 3
2016 ******* 7
2017 ******** 8
2018 ****** 6
2019 ********************* 21
2020 ************* 13
2021 ******** 8
2022 *** 3

Dataverse Installations Over Time
2005 * 1
2008 ** 2
2012 **** 4
2013 ****** 6
2014 ********** 10
2015 ************* 13
2016 ******************** 20
2017 **************************** 28
2018 ********************************** 34
2019 ******************************************************* 55
2020 ******************************************************************** 68
2021 **************************************************************************** 76
2022 ******************************************************************************* 79
IlariaBelvedere commented 2 years ago

Thank you very much for the graphs, it is satisfying to look at the statistics over the years :) And thank you for thanking me ahahha, I am happy if I can help :) I think there are other things that could be done maybe. 1) A few description of the installations could be filled, to complete the picture XD 2) Some of the installations could no longer be active, is there a way to find out about the most recent updates by communicating with the institutions themselves?

pdurbin commented 2 years ago

1) A few description of the installations could be filled, to complete the picture XD

@IlariaBelvedere Yes, I also noticed that some descriptions are missing. If you feel like creating new issues and reaching out to the installation contacts, please go ahead.

2) Some of the installations could no longer be active, is there a way to find out about the most recent updates by communicating with the institutions themselves?

Probably. The way to communicate with an installation is to look at contact_email in the spreadsheet mentioned in the README. Here's a direct link: https://docs.google.com/spreadsheets/d/1bfsw7gnHlHerLXuk7YprUT68liHfcaMxs1rFciA-mEo/edit#gid=0

pdurbin commented 2 years ago

In 650911f I put together some charts at https://iqss.github.io/dataverse-installations/charts.html

Here's how they look:

by-year-and-over-time

That charts.html page isn't linked from anywhere. I'm not sure if it should be a standalone page or not.

https://iqss.github.io/dataverse-installations/ has the map and descriptions (screenshot below). Maybe for now we could add a link under the map (and before the descriptons) to the new charts.html page. Thoughts?

Screen Shot 2022-06-06 at 5 22 25 PM

IlariaBelvedere commented 2 years ago
  1. A few description of the installations could be filled, to complete the picture XD

@IlariaBelvedere Yes, I also noticed that some descriptions are missing. If you feel like creating new issues and reaching out to the installation contacts, please go ahead.

  1. Some of the installations could no longer be active, is there a way to find out about the most recent updates by communicating with the institutions themselves?

Probably. The way to communicate with an installation is to look at contact_email in the spreadsheet mentioned in the README. Here's a direct link: https://docs.google.com/spreadsheets/d/1bfsw7gnHlHerLXuk7YprUT68liHfcaMxs1rFciA-mEo/edit#gid=0

Thank you @pdurbin! I am going to try to contact them and I will let you know about the results. :)

IlariaBelvedere commented 2 years ago

@pdurbin I noticed now that the missing description are only six, if I am not wrong, so I think they can be found on the official sites and be filled in this way, what do you think? About the contact emails, there is the one from Abacus that I think has to be updated, because that one is not working: however, there are the mails of the library here: https://ask.library.ubc.ca/.

pdurbin commented 2 years ago

@IlariaBelvedere before we start filling in missing descriptions, can you please create a new issue for this? Yes, copying a reasonable description from an official site sounds fine.

A new issue for the Abacus email too, please! 😄

pdurbin commented 2 years ago

I announced the new charts at https://groups.google.com/g/dataverse-community/c/jqfxdU3e2FQ/m/cgRYbH4nAQAJ and https://dataversecommunity.slack.com/archives/C5V66TV6Y/p1654551361864179

I think the last thing to do for this issue is link to https://iqss.github.io/dataverse-installations/charts.html from somewhere. But where?