EDIorg / ECC

ECC = EML Congruence Checker
5 stars 0 forks source link

check to encourage better data set titles? #12

Open srearl opened 6 years ago

srearl commented 6 years ago

This is more in the realm of training and awareness but titles of some data set, even from seasoned IMs, can be just appalling. I wonder if there is any sort of check, even if never gets beyond 'info', that would encourage better data set titles.

mobb commented 6 years ago

We have a title-length check, which is a warn. That check looks for a title that is > 5 words, and "length" was as complex as we could reasonably handle (there is a similar check for abstract, also a warn). A more "semantic" check would have to try to intuit anticipated terms from a title (eg, dates, organism names, place names, measurements) - which I think will be beyond reasonableness for quite a while.

I agree that this is more of a training/awareness issue, and the BP doc also talks about what makes a good title and abstract too. The simplest way would be to ramp up evangelizing, and maybe the fields that go into a citation would be a good place to start, as they are most visible. IMO what would elevate the whole community is some sort of dataset-review step, either as peer review or as a copy editor with a checklist. Peer-review is just that (another data mgr), but a copy editor is more likely to be associated with the repository.

srearl commented 6 years ago

Thanks @mobb. I could envision a check, for example, that evaluates the title against the geography of place names, or implements some aspects of text- or sentiment-analysis. But, yeah, something along those lines is likely beyond the scope at this point and could be wishful thinking (though I hope not).

In the meantime, would love to chat more about how to ramp up evangelizing, and your ideas of 'review' - possible ASM fodder?