Closed sat01a closed 1 year ago
@adam-collins has picked up this issue and will be worked on next week. (cc: @TaniaGLaity )
@sat01a this is a tricky change, as currently all new list records add a metadata entry in Collectory when they are created. I think we should really pause and question this requirement, as it is a lot of work and it goes against the "open data ethos" of the ALA. All metadata is currently "open" in the ALA and many other similar orgs have a deliberate policy of making the metadata open but controlling access to the actual data. E.g. Bioplatforms Australia does this (despite having extremely sensitive sequence data in its repository) see example: https://data.bioplatforms.com/dataset/bpa-tsi-pacbio-hifi-357732-da064156.
I find it hard to understand how the existence of a list could be sensitive. Is this just excessive caution or a provable risk that is too great to tolerate? The obvious non-technical solution is to carefully name the list (the only thing exposed in Collectory) - there is no reason why "codes" or vague naming couldn't be the solution to this problem.
based on the lists governance framework, list created by users will generally be private by default and the majority of these aren't useful to anyone other than the creator at a point in time nor do they have much in the way of metadata associated with them. do we want to publish metadata for thousands of random lists that people create that aren't useful. I would suggest that this would not really be a good look for the ala and it's credibility? wrt the biosecurity list - might be a question for @turley85 or @erinroger
... publish metadata for thousands of random lists that people create that aren't useful
Is a separate issue that is getting some thought and input for the "new specieslist" implementation, as you know.
My comment is about retro-fitting this to the existing lists and Collectory apps, that are old, fragile and hard to work with. It seems like a lot of work and risk (of bugs) for a problem that might be solved in a non-technical way, so I was simply flagging this as another option.
<opinionated rant>
FYI, my take is "yes, we do" but with the caveat that random lists should be an opt-in part of the UI and not shown by default. Lee was very vocal on this issue (-> all metadata and data should be public) and although I didn't agree with everything he pushed for, I do agree that making metadata public is almost a "requirement" for a publicly-funded resource like the ALA.
Lee's take was that you can't predict or put a price on the benefit some seemingly useless list could be of interest to one researcher, at some point in the future. It could lead to collaboration or trigger some question that gets further research. Once you close the door to this, its locked up forever. My take is you have to have a really, really good reason to not keep (meta) data public, and that this is a fundamental requirement for "science" to happen. So I was simply asking do we have a really, really good reason or just a "good" reason.
I don't see how having more metadata on random users lists could hurt ALA's credibility. It could be better presented in terms of UI and UX (poor UX can hurt us) but the fundamental "fact" we have it and expose it, can only be seen as good (IMO). It shows people are using our services, uploading data and interacting the ALA. Lack of such interactions could be argued as hurting ALA's credibility as it indicates we're irrelevant and not being used. Let's fix the UX but not throw out the baby with the bath water.
</opinionated rant>
Thank you, @nickdos and @TaniaGLaity, for sharing your interesting perspectives on this requirement. I appreciate your insights. As we move forward, it's important to prioritize the fundamental principle of open data while also ensuring a balance with other critical considerations such as data quality. To explore this further, I will schedule a meeting sometime this week or next week to discuss and finalize the way forward on this. (cc: @sughics )
Part of this investigation should involve the spatial portal (SP) list creation, which is where a lot of these problems occur - we discussed this yesterday on slack: https://atlaslivingaustralia.slack.com/archives/CCT9J1GUU/p1681192972155039
The SP allows users to create lists but only allows them to select from lists which are public, so eventually we get a lot of "1.csv" lists being created publicly. Also, the SP doesn't have good metadata capture which encourages poor data habits.
I agree with Lee that we can't necessarily judge whether a list is good or bad, but I do think that I don't care about a list if the metadata that comes with it is ordinary - ie "1.csv" with no information with it - I only see as useful in the most rare of circumstances. I would prefer that where we used the lists tool operationally like this - for users, we just hide them. Where we explicitly mean to share lists and make them available - ie. conservation, sensitive, "lee's 2014 list of endangered birds", "nuck's australian species" .... then our UI should encourage good metadata capture, and make them well discoverable with nice UX. Put DOIs on them. Etc Etc. Public and private lists can well handle both of these scenarios.
Thanks @peggynewman . Noted
Included with release 4.1.0
Reported by @TaniaGLaity : I've added an item to your issues register (pinned to this channel). It has come to our attention that all metadata for all lists are being published and are being also harvested to the ARDC RDC repository. Could we please not publish the metadata for private lists? there are a few reasons for this including the existence of some lists in itself is sensitive eg. some biosecurity ones. also there is a proliferation of private lists which only contain one species and very limited metadata.