Meaningful Ids for indicators - Githubissues

RESQUE-Framework / website

The Research Quality Evaluation Scheme

https://resque-framework.github.io/website/

MIT License

2 stars 3 forks source link

Meaningful Ids for indicators #32

Closed alpkaanaksu closed 9 months ago

alpkaanaksu commented 10 months ago

Criteria:

Easy to use in R and JS -> [a-zA-Z0-9_]+
Easy to read
Long enough but no unnecessary information: Fields that can only exist once per object (regardless of whether an expansion pack is used) do not have a prefix. e.g. DOI, Title, Year.
There is a convention for follow up questions regarding identifieres (DOI/URL) and explanations if 'Not applicable' is selected.
IDs should contain information about dependencies

Old	New
P3	Title
P4	Year
P5	DOI
p_top_paper	P_TopPaper
P6	P_TypeMethod
P6_other	P_TypeMethod_Other
P7	P_TypePublication
P7_other	P_TypePublication_Other
P8	P_Suitable
P8_explanation	P_Suitable_OptOut
P9_info	P_CRediT_Info
P9a	P_CRediT_Conceptualization
…	…
P9n	P_CRediT_WritingReviewEditing
P10	P_Data
P11	P_Data_Open
P11_explanation	P_Data_Open_NotApplicable
P11_extra	P_Data_Open_Identifier
P12	P_Data_Open_AccessLevel
P12_explanation	P_Data_Open_ZK2Explanation
P13	P_Data_Open_FAIR
P14	P_IndependentVerification
P14_explanation	P_IndependentVerification_NotApplicable
P14_extra	P_IndependentVerification_Identifier
P15	P_ReproducibleScripts
P15_explanation	P_ReproducibleScripts_NotApplicable
P15_extra	P_ReproducibleScripts_Identifier
P16	P_ReproducibleScripts_FAIR
P17	P_OpenMaterials
P17_explanation	P_OpenMaterials_NotApplicable
P17_extra	P_OpenMaterials_Identifier
P18	P_Preregistration
P18_explanation	P_Preregistration_NotApplicable
P18_extra	P_Preregistration_Identifier
P19	P_Preregistration_Content
P20	P_FormalModeling
P20_explanation	P_FormalModeling_NotApplicable
P21	P_PreregisteredReplication
P21_explanation	P_PreregisteredReplication_NotApplicable
P21_extra	P_PreregisteredReplication_Identifier
P22	P_PowerConsiderations
P23	P_OpenScienceBadges
P24	P_SampleSize
P25	P_Merit

alpkaanaksu commented 10 months ago

Maybe use 'NA' instead of 'NotApplicable' and 'Id' instead of 'Identifier'? I am not sure if we need/want those abbreviations. (I am not a fan of abbreviations in general, I usually try to avoid them, especially when writing texts. But it might be okay to have them in ids)

ChatGPT says 'Identifier' is a common name for URL and DOI :)

The common name for both URL and DOI in the context of digital resources is "identifier". Both of these are unique identifiers used to locate and access specific resources on the internet.

nicebread commented 10 months ago

"NA" is ambiguous, as it means both "not available" (which implies 0 points and no reduction of the max points) and "not applicable".

I changed all "NotApplicable" to "NAExplanation" (violating my own reasoning above ...).

"ID" is fine.

minimal changes:

Old	New
P3	Title
P4	Year
P5	DOI
p_top_paper	P_TopPaper
P6	P_TypeMethod
P6_other	P_TypeMethod_Other
P7	P_TypePublication
P7_other	P_TypePublication_Other
P8	P_Suitable
P8_explanation	P_Suitable_Explanation
P9_info	P_CRediT_Info
P9a	P_CRediT_Conceptualization
…	…
P9n	P_CRediT_WritingReviewEditing
P10	P_Data
P11	P_Data_Open
P11_explanation	P_Data_Open_NAExplanation
P11_extra	P_Data_Open_Identifier
P12	P_Data_Open_AccessLevel
P12_explanation	P_Data_Open_ZK2Explanation
P13	P_Data_Open_FAIR
P14	P_IndependentVerification
P14_explanation	P_IndependentVerification_NAExplanation
P14_extra	P_IndependentVerification_Identifier
P15	P_ReproducibleScripts
P15_explanation	P_ReproducibleScripts_NAExplanation
P15_extra	P_ReproducibleScripts_Identifier
P16	P_ReproducibleScripts_FAIR
P17	P_OpenMaterials
P17_explanation	P_OpenMaterials_NAExplanation
P17_extra	P_OpenMaterials_Identifier
P18	P_Preregistration
P18_explanation	P_Preregistration_NAExplanation
P18_extra	P_Preregistration_Identifier
P19	P_Preregistration_Content
P20	P_FormalModeling
P20_explanation	P_FormalModeling_NAExplanation
P21	P_PreregisteredReplication
P21_explanation	P_PreregisteredReplication_NAExplanation
P21_extra	P_PreregisteredReplication_Identifier
P22	P_PowerConsiderations
P23	P_OpenScienceBadges
P24	P_SampleSize
P25	P_Merit

nicebread commented 10 months ago

Just a thought (if you haven't implemented them yet): Maybe add "has" and "is" to appropriate indicators? E.g. P10 / P_Data could beP_has_Data`.

alpkaanaksu commented 10 months ago

This is how I think about it: with '_', you go into a kind of subfield. 'P' is the root for publication indicators. 'P_Data' ist the parent node for all other data related items and is the entry point to the data subfield , 'P_Data_Open' is the parent node for all open data items and so on.

This '_has'/'_is' suffix implies that we have two new main categories for indicators, according to which 'P_is_TypePublication' and 'P_is_Suitable' are somehow related. They are both 'is' attributes, this is a similarity, but I don't know if this similarity is meaningful. I personally don't see any reason to give 'P_TypePublication' and 'P_Suitable' a common parent node.

Do you have some use cases for this in mind?

nicebread commented 10 months ago

OK, I unterstand your hierarchical logic. My idea was that the semantics are more intuitive ("P_has_open_data" (1/0) is directly understandable). But that probably implies another structure.

alpkaanaksu commented 10 months ago

We can think about creating that kind of ids: 'p_has_open_data' (you can literally read it like 'publication has open data'), 'p_is_preregistered_replication', 'p_open_data_access_level'. We lose the hierarchical structure but it is easier to read.

Hierarchical ids give us more information about the indicators, this is maybe more important than readibility? Which one is more important to you?

nicebread commented 10 months ago

I think I'd prefer the readable style. Could you add a third column to the table where we compare them? (Not necessarily all indicators, just for the first 15 or so to get an idea).

alpkaanaksu commented 10 months ago

Since '_' has no special meaning in ids with no hierarchical information, we can just use normal snake_case instead of the weird mix we had.

Old	New (hierarchical)	New (readable)
P3	Title	title
P4	Year	year
P5	DOI	doi
p_top_paper	P_TopPaper	p_is_top_paper
P6	P_TypeMethod	p_method
P6_other	P_TypeMethod_Other	p_method_other
P7	P_TypePublication	p_type
P7_other	P_TypePublication_Other	p_type_other
P8	P_Suitable	p_is_suitable
P8_explanation	P_Suitable_Explanation	p_is_suitable_explanation
P9_info	P_CRediT_Info	p_credit_info
P9a	P_CRediT_Conceptualization	p_credit_conceptualization
...
P10	P_Data	p_has_data
P11	P_Data_Open	p_has_open_data
P11_explanation	P_Data_Open_NAExplanation	p_has_open_data_na_explanation
P11_extra	P_Data_Open_Identifier	p_open_data_identifier
P12	P_Data_Open_AccessLevel	p_open_data_access_level
P12_explanation	P_Data_Open_AccesLevel_ZK2Explanation	p_open_data_access_level_zk2_explanation
P13	P_Data_Open_FAIR	p_is_open_data_fair
P14	P_IndependentVerification	p_has_independent_verification

nicebread commented 10 months ago

After some discussion, we decided to stay with the "hierarchical" style (has some practical advantages, although at the cost of being slightly less intuitive)

alpkaanaksu commented 10 months ago

Replaced all Ids in indicator definitions and scoring. 3febc979dda55ae5f0c76ec160f51dee1fff0e18

We should test everything before closing this issue. This change can break a lot of things.

nicebread commented 10 months ago

My tests found no bug so far ...

alpkaanaksu commented 9 months ago

I think it is safe to assume that there are no bugs related to the new Ids. I think we would have found them by now.