Open pdurbin opened 5 years ago
In IRC today I announced a new spreadsheet I'm calling "Crowdsourced information about Dataverse Installations": https://docs.google.com/spreadsheets/d/1bfsw7gnHlHerLXuk7YprUT68liHfcaMxs1rFciA-mEo/edit?usp=sharing
The first question I asked was this, "please add your launch_year, the year you started with Dataverse (fresh or migration from some other system)".
Here's the conversation: http://irclog.iq.harvard.edu/dataverse/2019-10-02#i_107856
On Friday @shlake and I talked about different approaches for getting this data:
@shlake do you have a preference? Am I forgetting any other approach we talked about?
All, are there more approaches we haven't considered? @poikilotherm is starting to hack on the map. 😄
@pdurbin I'm going to pester the list (just once - not keep doing it) AND will send targeted emails to folks we know.
Once we figure out ALL the bits of info we need, then I think we can declare a census to get what we don't have.
I just made a couple pull requests to help try to make it more apparent that some launch years are missing.
I'm going to pester the list (just once - not keep doing it) AND will send targeted emails to folks we know.
@shlake do you still want to pester the list once or should I re-pester? 😄 I just made the following diagram which I was planning to sent to the list. Please let me know! It's a screenshot from https://dataverse.org/installations with some angry red added for the years we are missing. 😄
As you know the fix it to ask everyone to fill in the year in the "crowdsourced spreadsheet" spreadsheet: at https://docs.google.com/spreadsheets/d/1bfsw7gnHlHerLXuk7YprUT68liHfcaMxs1rFciA-mEo/edit#gid=0
Then we run python3 update-data.py
and make pull requests.
I'm motivated because yesterday my friend and officemate @erikbuunk started making some killer data visualizations about the Dataverse community (that I can't wait to share!). 🎉
Oh, my other thought is that we can @ mention people here know who could track down the data. People like @eugene-barsky @skasberger @umuchlish @Venki18 @dheles @sjaefulafandi @lmaylein @adam3smith @CCMumma @kaitlinnewson @jmjamison and others that aren't top of mind.
This just in from @skasberger for AUSSDA Dataverse - " you once asked me, when we launched. it was the 20th of august 2017."
TDR was launched in November 2016 - sorry I missed this one earlier.
@pdurbin just sent an email to the list asking for dates.
@shlake THANK YOU! You're the best! 🎉 🎉 🎉
For anyone who missed it, here's a link to that email: https://groups.google.com/d/msg/dataverse-community/jqfxdU3e2FQ/qroo3eMvAAAJ
Thank you to @IlariaBelvedere for taking an interest in this issue, creating some sub-issues, and reviving the thread at https://groups.google.com/d/msg/dataverse-community/jqfxdU3e2FQ/qroo3eMvAAAJ !
By the way, here's a one liner for getting the counts per year:
cat data/data.json | jq '.installations[].launch_year' -r | sort | uniq -c
1 2005
1 2012
1 2013
1 2014
2 2015
4 2016
5 2017
1 2018
9 2019
13 2020
8 2021
3 2022
30 null
Here's a quick chart using Google Spreadsheets:
I don't think the spreadsheet is still editable? I can comment only, which isn't super helpful for spreadsheets. QDR launched its Dataverse catalog in 2018
@adam3smith I get paranoid about just anyone editing that spreadsheet. Please click "request access" and I'd be happy to let you in!
Update: I added QDR in c5473ac
In 42471d2 I just added a simple ASCII chart of installations per year at https://iqss.github.io/dataverse-installations/year.html
It looks like this:
Dataverse Installations by Year
2005 * 1
2012 * 1
2013 * 1
2014 * 1
2015 ** 2
2016 **** 4
2017 ***** 5
2018 * 1
2019 ********* 9
2020 ************* 13
2021 ******** 8
2022 *** 3
???? ****************************** 30
Below the chart I'm listing the installations that we don't have a launch year for. Note that some of these have GitHub Issues already:
Installations with Unknown Launch Year
Botswana Harvard Data
CIMMYT Research Data
Data INRAe
Data Suds
DataSpace@HKUST
DR-NTU (Data)
Fudan University
Göttingen Research Online
Harvard Dataverse
HeiDATA
IBICT
ICRISAT
ICWSM
Ifsttar Dataverse
International Potato Center
Johns Hopkins University
LIPI Dataverse
Maine Dataverse Network
MELDATA
NIE Data Repository
Peking University
Pontificia Universidad Católica del Perú
QDR Main Collection
Repositorio de Datos de Investigación Universidad del Rosario
Repositório de Dados de Pesquisa da UFABC
Repositório de Dados de Pesquisa do ILEEL
UCLA Dataverse
University of Manitoba Dataverse
Università degli Studi di Milano
UWI
Thank you @IlariaBelvedere for all the research! Here's how https://iqss.github.io/dataverse-installations/year.html looks now!
Dataverse Installations by Year
2005 * 1
2008 * 1
2012 * 1
2013 * 1
2014 *** 3
2015 *** 3
2016 **** 4
2017 ******** 8
2018 ****** 6
2019 ******************** 20
2020 ************* 13
2021 ******** 8
2022 *** 3
???? ******* 7
Installations with Unknown Launch Year
CIMMYT Research Data
DataSpace@HKUST
Fudan University
Göttingen Research Online
ICRISAT
Johns Hopkins University
Maine Dataverse Network
Here's the most recent research:
Thanks again, @IlariaBelvedere!
Ok! I just did some research and added launch years for the last few installations. Here's how https://iqss.github.io/dataverse-installations/year.html looks now:
Dataverse Installations by Year
2005 * 1
2008 * 1
2012 ** 2
2013 ** 2
2014 **** 4
2015 *** 3
2016 ******* 7
2017 ******** 8
2018 ****** 6
2019 ********************* 21
2020 ************* 13
2021 ******** 8
2022 *** 3
2019 was a big year for us! 😄
I guess next we should think about what the definition of done is for this issue. 😄
Here's a quick graph from Google Spreadsheets showing the cumulative number or running total of installations.
The formula is from https://www.statology.org/google-sheets-cumulative-percentage/
This is how I get the year and count:
cat data/data.json | jq '.installations[].launch_year' -r | sort | uniq -c | awk '{print $1, $2}' | tr " " "\t"
Hello, thank you for the graph? Could I use it in my thesis (with attribution)?
@IlariaBelvedere absolutely! Please go ahead. And thanks again for helping so much!
I actually added a second ASCII art graph (over time) last night to https://iqss.github.io/dataverse-installations/year.html
Here's how they look as of this writing:
Dataverse Installations by Year
2005 * 1
2008 * 1
2012 ** 2
2013 ** 2
2014 **** 4
2015 *** 3
2016 ******* 7
2017 ******** 8
2018 ****** 6
2019 ********************* 21
2020 ************* 13
2021 ******** 8
2022 *** 3
Dataverse Installations Over Time
2005 * 1
2008 ** 2
2012 **** 4
2013 ****** 6
2014 ********** 10
2015 ************* 13
2016 ******************** 20
2017 **************************** 28
2018 ********************************** 34
2019 ******************************************************* 55
2020 ******************************************************************** 68
2021 **************************************************************************** 76
2022 ******************************************************************************* 79
Thank you very much for the graphs, it is satisfying to look at the statistics over the years :) And thank you for thanking me ahahha, I am happy if I can help :) I think there are other things that could be done maybe. 1) A few description of the installations could be filled, to complete the picture XD 2) Some of the installations could no longer be active, is there a way to find out about the most recent updates by communicating with the institutions themselves?
1) A few description of the installations could be filled, to complete the picture XD
@IlariaBelvedere Yes, I also noticed that some descriptions are missing. If you feel like creating new issues and reaching out to the installation contacts, please go ahead.
2) Some of the installations could no longer be active, is there a way to find out about the most recent updates by communicating with the institutions themselves?
Probably. The way to communicate with an installation is to look at contact_email
in the spreadsheet mentioned in the README. Here's a direct link: https://docs.google.com/spreadsheets/d/1bfsw7gnHlHerLXuk7YprUT68liHfcaMxs1rFciA-mEo/edit#gid=0
In 650911f I put together some charts at https://iqss.github.io/dataverse-installations/charts.html
Here's how they look:
That charts.html
page isn't linked from anywhere. I'm not sure if it should be a standalone page or not.
https://iqss.github.io/dataverse-installations/ has the map and descriptions (screenshot below). Maybe for now we could add a link under the map (and before the descriptons) to the new charts.html
page. Thoughts?
- A few description of the installations could be filled, to complete the picture XD
@IlariaBelvedere Yes, I also noticed that some descriptions are missing. If you feel like creating new issues and reaching out to the installation contacts, please go ahead.
- Some of the installations could no longer be active, is there a way to find out about the most recent updates by communicating with the institutions themselves?
Probably. The way to communicate with an installation is to look at
contact_email
in the spreadsheet mentioned in the README. Here's a direct link: https://docs.google.com/spreadsheets/d/1bfsw7gnHlHerLXuk7YprUT68liHfcaMxs1rFciA-mEo/edit#gid=0
Thank you @pdurbin! I am going to try to contact them and I will let you know about the results. :)
@pdurbin I noticed now that the missing description are only six, if I am not wrong, so I think they can be found on the official sites and be filled in this way, what do you think? About the contact emails, there is the one from Abacus that I think has to be updated, because that one is not working: however, there are the mails of the library here: https://ask.library.ubc.ca/.
@IlariaBelvedere before we start filling in missing descriptions, can you please create a new issue for this? Yes, copying a reasonable description from an official site sounds fine.
A new issue for the Abacus email too, please! 😄
I announced the new charts at https://groups.google.com/g/dataverse-community/c/jqfxdU3e2FQ/m/cgRYbH4nAQAJ and https://dataversecommunity.slack.com/archives/C5V66TV6Y/p1654551361864179
I think the last thing to do for this issue is link to https://iqss.github.io/dataverse-installations/charts.html from somewhere. But where?
When I see graphs like the following from http://slides.com/mercecrosas/dataversecommunity2018#/5 ...
... I think, "This is a great graph but it would be nice to have the actual launch date of each of the Dataverse installations. We could add a column for this in the database.