gbif / ipt

GBIF Integrated Publishing Toolkit (IPT)
https://www.gbif.org/ipt
Apache License 2.0
127 stars 57 forks source link

Align citation with GBIF.org #1360

Closed timrobertson100 closed 1 year ago

timrobertson100 commented 7 years ago

The IPT citation does not reflect what the GBIF.org system does. This is highly confusing, and GBIF's publishing tool need to be fully consistent and intuitive.

I am not sure which is incorrect.

timrobertson100 commented 7 years ago

The corresponding GBIF registry issue is https://github.com/gbif/registry/issues/4

dschigel commented 7 years ago

Should user still have an opportunity to give a free text citation to IPT + a warning that this won't be displayed at GBIF.org, or should this opportunity be closed completely?

peterdesmet commented 7 years ago

I agree that both should be aligned! Regarding allowing the user to add a free text: that's something to avoid if it's never going to end up on GBIF (with or without warning)!

I think it's better to limit the freedom to the user, but still allow him to provide some things to be included in an automatically generated citation.

Here's what we do

  1. Start with an automatic citation
  2. Include DOI as citation identifier
  3. If we have a data paper: turn automatic citation off and add reference to the data paper
  4. Remove version number (because with the automatic citation turned off, it will just get obsolete)... there's not much we can do about the year

Now, if we would be able to provide the DOI of the dataset, and the DOI of the data paper, and the citation is build from there, we would be happy.

Other remarks

Example IPT vs GBIF citation

Differences in bold.

Citation on IPT: http://data.inbo.be/ipt/resource?r=dagvlinders-inbo-occurrences

Maes D, Brosens D, Beck O, Van Dyck H, Desmet P, Vlinderwerkgroep Natuurpunt, all butterfly recorders (2016): Vlinderdatabank - Butterflies in Flanders and the Brussels Capital Region, Belgium. Research Institute for Nature and Forest (INBO). Dataset/Occurrence. http://doi.org/10.15468/njgbmh Data paper: http://doi.org/10.3897/zookeys.585.8019

Citation on GBIF: https://www.gbif.org/dataset/7888f666-f59e-4534-8478-3a10a3bfee45

Maes D, Brosens D, Beck O, Van Dyck H, Desmet P (2017). Vlinderdatabank - Butterflies in Flanders and the Brussels Capital Region, Belgium. Version 1.4. Research Institute for Nature and Forest (INBO). Occurrence Dataset https://doi.org/10.15468/njgbmh accessed via GBIF.org on 2017-09-15.

dschigel commented 6 years ago

As discussed with, but not yet checked by @ahahn-gbif:

In the IPT, next to the custom citation field, please add, somehow visibly, the following warning: " Note that the citation for GBIF.org will be automatically generated based on your metadata fields, such as dataset.authors, dataset.pubDate, dataset.title, dataset.version, organization.title, dataset.type, dataset.doi, while your free text citation will be ignored by GBIF.org. By editing these metadata fields you may modify the appearance of the citation at GBIF.org. This custom (free text) citation can, however, be used for citing the data for direct access through IPT. Wondering why? Welcome to FAQ https://www.gbif.org/faq?q=citation "

camiplata commented 3 years ago

I see this is a long standing issue, I would like to know if there are any plans to address it for the coming release (if there is one coming soon).

@sibcolombia

dschigel commented 3 years ago

@ahahn-gbif as we just discussed this again lately, I think we should add some closing remarks and close this, do you agree?

camiplata commented 3 years ago

@dschigel @ahahn-gbif that would be great as this disparity between IPT and GBIF creates a lot of noise between data publishers.

ahahn-gbif commented 3 years ago

@dschigel We will still need to pick this issue up in future IPT development work, but may want to move it to a different repository. There is no scheduled development work or release plan for the IPT at this point, the future IPT work being in early scoping stages. At this point, the IPT citation editing page contains the following warning against free-text citations:

image

I assume we are only talking about the auto-generated citations from here on. I agree that their generation logic should be aligned between gbif.org and the IPT, so that an IPT data administrator has a reasonably clear idea of how the citation will later show on gbif.org. However, for several of the reasons @peterdesmet states above, both citations will never be fully identical, especially in the details pointing back at the source - citing an IPT endpoint is not referencing a dataset version fully identical to the one accessible through gbif.org (it may not even be published in a GBIF context), and both citations will continue to differ.

Maybe it is important to communicate this latter point: accessing the dataset through the IPT endpoint will give access to the data as configured at source, potentially including unindexed extensions, different preferred taxon names, geodetic organization, etc. It will, on the other hand, not contain components added or annotated during the ingestion processes of GBIF et al.: alignment with a core taxonomic structure for unified search access, standardization of certain data fields, annotation of potential issues, data interpretation to mediate detected issues.

dschigel commented 3 years ago

We came to this in the Humboldt core topic, and I realize we can possibly solve it by focusing on DOI and by stressing that it's the target context that dictates citation format.

I come from two assumptions:

If you agree with these statements, we actually don't need a recommended citation at all. For DOI based data citation to work, we don't need a publisher preferred citation either (but we can keep it in the IPT resource as it seems to be important, and yes we can stress that it is IPT resource which is then cited, not the GBIF view). Every GBIF page instead of the current Citation footer can instead have a section along the CitetheDOI lines with approx the following text, see below. In IPT view we need to have much softer wording e.g. "should" -> "example". The sentence on the publisher recommended view would be only shown if publisher recommended citation is not null. I think the suggestion below would capture key wishes expressed above. Styling and English will need to be fixed.

How to cite GBIF mediated data is free for all, but is not free from obligations. Every GBIF user, according to data user agreement, is requested to cite the DOI for downloads or DOI for datasets when referring to such data. To format your references, please follow guidelines for authors and styling advice of your use case, e.g. journal, but do not omit the DOI. In the absence of clear formatting guidelines, you may also take into account citation recommendations provided by the data publisher, or follow GBIF's recommendation:

Karlsholt O, Pedersen J, Hansen (deceased) M, Schigel D, Braak K (2016). Insects from light trap (1992–2009), rooftop Zoological Museum, Copenhagen. Version 1.4. Natural History Museum of Denmark. Sampling event dataset https://doi.org/10.15468/xabmiz accessed via GBIF.org on 2021-03-17.

abubelinha commented 2 years ago

We sometimes tried to use test IPTs + test registry for checking how different sections of metadata would finally look once published (links and other html stuff, which authors will appear in citation and in which order, and things like that). Authors always prefer to see a test version of the final product.

The problem -at least several times we tried- is the slowness of the test-portal in reflecting those changes after test IPT publications (even for metadata-only datasets). That's a bottleneck. Do you know if there is a previous issue where I can comment about this? If not, which is the most appropriate repository for opening it? (IPT, registry, portal-feedback, ...)

Thanks a lot in advance @abubelinha (@dgasl also interested in this)

mike-podolskiy90 commented 2 years ago

@abubelinha Thank you for the questions. This is a portal thing I think.

ptyk commented 1 year ago

Hello everyone. I found the discussion after realizing the isssue of author names and order. My case is a set of checklist datasets, that I am going to publish on behalf of a large group of authors as a series of chapters. I originated the idea and now I prepared a standard metadata description, which will be applied to all of the checklists (and modified in some cases). So this is my main input in terms of the content. I cannot be treated as the author of the dataset. But:

If I delete myself from the metadata authorship it will also be wrong. I think IPT should provide a way to control it, using a simple checkbox near the metadata author part (ie "tick for inclusion to the author string" or so).

What do you think?

peterdesmet commented 1 year ago

I agree. I think only the resource creators should be included as authors (in the order provided), as is done by the IPT.

ahahn-gbif commented 1 year ago

I am a bit reluctant about that direction. The original proposal was to have the IPT follow the logic of GBIF.org. The inclusion of metadata authors in the citation string had been discussed and decided in favor of, because metadata can often contribute substantially to the quality and usability of a dataset, and are not necessarily provided by the curators of the datasets themselves. Also, on a more procedural level, this would change citations for quite a number of datasets in GBIF.org without prior consultation or even information, which does not sound quite right.

I would rather propose to

ahahn-gbif commented 1 year ago

I am sorry - I overlooked that it is not possible in the IPT to not declare a metadata author, which makes this more tricky. So if I understand correctly, the situation we would like to reach is one where

At present

questions to check into:

MattBlissett commented 1 year ago

Remember we are generating EML here, so we don't have complete flexibility.

The metadata authorship becomes dataset/metadataProvider which is optional in EML, only dataset/creator is required. I don't think it's possible to include a metadataProvider in EML but somehow mark it to be excluded from a citation.

See https://eml.ecoinformatics.org/schema/, specifically https://eml.ecoinformatics.org/schema/eml_xsd.html#eml_dataset and https://eml.ecoinformatics.org/schema/eml-resource_xsd.html#ResourceGroup_metadataProvider

mdoering commented 1 year ago

I never understood the reasons for including the metadata author in the generated citation. I would strongly consider to remove that. If that person wants/needs to be cited I think it should become also a proper author/creator of the resource. Offering an option to include/exclude the metadata author would only add to the complexity I think.

It might also be worth mentioning that In ChecklistBank we have decided to follow yet another approach. The citation string really is mixing citation information with citation styles. There are thousands of styles out there and journals pick and require different citation styles that authors have to follow. Isn't it much better to use a structured citation like BibTex or CSL-JSON that users can format according to the style they need? It would free us from discussing some of the citation details and GBIF.org could pick its preferred style for formatting. But an IPT installation could select the style of their choice instead but could always preview the citation in the GBIF style when publishing. Note also that EML 2.2 has added support for structured citations. More information on the CLB implementation can be found in https://github.com/CatalogueOfLife/backend/issues/989

Removing the metadata author from the citation would align better with CLB/COL.

mike-podolskiy90 commented 1 year ago

@mdoering Removing metadata authors would affect thousands of citations at GBIF.org

I think we should make metadata providers an optional section in the IPT basic metadata since it's optional in EML

mdoering commented 1 year ago

Yes. On the other hand it would mean that I am forced to not say who authored the metadata just to remove me from the citation.

ahahn-gbif commented 1 year ago

Yes. We do have to consider the situation today, however. Silently changing thousands of citations on GBIF.org can have major fallout, as nice as the change may be. This is not a quick fix to push through.

albenson-usgs commented 1 year ago

I reviewed a few metadata records in the Environmental Data Initiative (EDI) repository and it does seem to me that the requirement for metadata provider that the IPT has enforced is unusual / not standard practice. In the few EML records I examined from EDI, none of them had metadata provider.

dschigel commented 1 year ago

@albenson-usgs I think this is actually quite revealing about attitudes towards metadata. When citation generating formula was rolled out (and now indeed affects thousands of dataset) GBIF.org view of the published datasets - which is the citable object in this case - is a product of data creation & reworks -> front authorship and of metadata creation & reworks -> metadata authorship. We have enough trouble with poor metadata across so many infrastructures, so removing the metadata authorhsip from the GBIF dataset equation will send us back to metadata-careless stone age thought metadata anonymity. I would be very protective of the second bullet here, and I understood @ahahn-gbif, too? Authorship is not only credit - it is responsibility, and this fully applied to metadata authoship. Please note that dataset can be cited (at its endpoint location) differently from the GBIF.org displayed instance.

ptyk commented 1 year ago

@dschigel I fully understand the need to maintain and increase the quality of metadata in GBIF datasets, but the statement you mention ("Name(s) of the dataset’s metadata author(s) [to be included to the author string], if one is registered, but only if also an originating author is named") does not necesarily close the topic. We may imagine the following scenario:

So if you decide to add this option to the IPT, it will not affect the existing datasets, and no one should complain. But ones who care will have the option. And it may also be appreciated by many curators of the older datasets.

peterdesmet commented 1 year ago

Metadata editing is indeed important and should be acknowledged. So is data collection, managing, etc. This is why we (INBO) include all these people as creators of a dataset, so they are included in the (IPT) citation. The metadata editor field is superfluous for us, because we already include those people as creators.

I’d rather have one list of people (contributors), that are all included as EML creators and who are all included as authors (cf. GBIF citation). That way names don’t have to be repeated too.

The IPT could still offer to indicate roles for those contributors (e.g. contact). That can be expressed in EML by listing those people under a specific property for that role (cf. current implementation), but in addition to them being listed as creator. It also provides a way to migrate info in the IPT: make metadata editors, creators and contacts all contributors and remove duplicates.

To acknowledge people (cf. acknowledgement in paper) that should not be included in the citation, use additional parties.

albenson-usgs commented 1 year ago

@dschigel I believe there is an assumption here that requiring someone to identify themselves as the metadata provider makes them 1) create better metadata and 2) feel more responsibility for the dataset. I am dubious that either of those things are true. The issue is not whether or not it should be an option to include metadata provider, it's whether it should be required.

I’d rather have one list of people (contributors), that are all included as EML creators and who are all included as authors (cf. GBIF citation). That way names don’t have to be repeated too.

While I agree that having to repeat author information up to three times (contact, creator, metadata provider) is quite tedious (you can copy from resource contact but only for the first contact), I don't agree with having only one list and it's only the contributors. I know for some of the projects I help share data it is nice to know who processed the data to Darwin Core but those people don't want to be listed as authors (and it would make the data originators frustrated to see that person's name in the citation).

It does seem to me that we need a more flexible way for IPT data managers to select and decide the authorship and order of authors in the citation for the IPT and on GBIF.org.

mdoering commented 1 year ago

... interestingly we have followed in ChecklistBank DataCite and CSL to list contributors with an optional note that can express how they contributed, but explicitly excluded them from being cited as authors. I really like the traditional way of separating authors, editors, the publisher (included in the citation string) and a flexible list of other contributors that are not part of the citation string. This way you can control who is part of the citation string, but still attribute others. I know some people prefer to cite each and everyone equally, but I don't think we should require such practices but instead leave this to the dataset publisher.

peterdesmet commented 1 year ago

Maybe it would be good then to have one list where all people are only listed once, but can be assigned multiple roles. Someone with author role is included in the citation.

mike-podolskiy90 commented 1 year ago

Further discussion here #1917