Closed ldodds closed 5 months ago
Hi @ldodds ,
This repo is mainly now a public archive and is no longer in use since we have moved to a different system to review DPG applications. Our main place for discussions is the DPG Standard repo.
Very interesting perspective and you have highlighted some of the challenges around the open data category we are sorting through and rethinking how the DPG Standard applies to data. Until now we have allowed projects to be classified under multiple categories which have also made it difficult for final users to understand when the documentation refers to one category or the other in some cases, we are soon to change this, and likely for some like Project AEDES.
Addressing some of the comments: Crosscut, Dicra, and Doptor Open Data provide both software functionalities/ services related to data AND access to data.
There's a caveat, all this information is provided by the product owners themselves, and although verified to some degree by the DPGA, they remain responsible for any claims or misrepresentations.
But happy to connect and chat more about this!
Hi @ricardomiron
Thanks for following up on my comments. I'll take a look at the DPG Standard repo, I'm really interested in how the DPG registry evolves and assessing datasets against standards & guidelines. Happy to chat further, you can reach me at leigh dot dodds at gmail.
Firstly, my apologies I've clearly made mistakes in my review. I will update my blog post with corrections.
Dicros, Doptor and Crosscut do clearly make data available. At least part of my mistake was focusing too much on the github repos associated with each project. My assumption was that those repos would hold both the code and the datasets associated with those public goods and that the websites were public deployments of that code.
That's obviously not the case:
Being able to more clearly distinguish between some software that may be used to host and serve datasets, and the DPGs that might be provided using that software would be helpful.
My question re: the CrossCut licensing is that the ODbL has some provisions which allow partial extracts to be freely licensed, but derived data and larger extracts trigger the sharealike provision. We explored a similar use case to CrossCut in a project I lead a few years ago. I understand it's not your role to police this though!
In the case of Open Terms Archive, I went to their website, clicked on the "Datasets" link in the navigation bar and then checked the "Download dataset" link for each of them. These all refer to ODbL and not ODC-BY.
Confusingly, the "main collection" you linked to is in a repo called contrib-versions
. The licence file available from your link does refer to ODC-By but from the website you're taken to the latest release page for the same dataset which says ODbL. As a user that's how I'd expect to discover the datasets.
I hope the feedback is useful and apologies again for the mistakes.
Apologies if this is not the correct replace to report issues with entries in the registry. If so, please point me in the right direction.
I've recently taken a look at the Data category of the DPG Registry to understand what datasets have been classified as DPGs.
I think there are some misclassified entries and at least one licensing mistakes. Some notes in my blog post linked above, but:
Three of the data are actually Software, I believe rather than data. These are Crosscut, Dicra, and Doptor Open Data. They all provide some way to organise, access or work with third-party datasets but they don’t provide any original datasets.
The ability for Crosscut to licence data as CC0 may also merit some review as all of their source datasets require attribution and one (OSM) requires a share-alike licence.
The Open Terms Archive submission indicates that the license for the data is ODC-BY. But this is a mistake as reviewing each of the downloads I found that the licenses are ODbL.
Project AEDES is classified as an AI Model, Open Content and Open Data. However again, it doesn't seem to provide any original data and is a project for building a predictive model.
Hope this feedback is useful!