HumanCellAtlas / metadata-schema

This repo is for the metadata schemas associated with the HCA
Apache License 2.0
65 stars 32 forks source link

Rename entity reference_file to mapping_reference_file #388

Closed lauraclarke closed 6 years ago

lauraclarke commented 6 years ago

Original request

  1. **Which schema needs to be changed?** Change to reference_file.json **What field in that schema need to be changed?** Name of the schema **What should the new field name be?** Change to mapping_reference_file.json **Why is the change requested?** Clarification - refererence_file reads like it might be the PDF of an associated publication among other things.

Rename entity reference_file to mapping_reference_file

Change type: major

Impact factor: high - change to type entity name used in a number of components

Consequences:

  1. Breaking change for all downstream components, in particular analysis pipelines, who are the primary users for this entity
  2. Not all reference files are mapping reference files (e.g. some are gene annotation files) so alternative types of reference files such as annotation references would require their own file type, causing schema bloat.
lauraclarke commented 6 years ago

I do not recommend this change is accepted. As described above not all reference files are mapping files. Documentation and help text can make it clear what this field is for. There are a lot of publication specific fields already in the schema such as

publications and supplementary_files in project

The context of this field along with its description makes its purpose very clear.

JimKent commented 6 years ago

It would be nice if in the spirit of an actual collaboration you accepted one of our changes now and again....

The gene annotations that I'm aware of at least are used for mapping. Could you give me an example of a use of a gene annotation that does not involve mapping?

daniwelter commented 6 years ago

Tagging @jishuxu @samanehsan @kbergin for comment as all reference_* elements were defined primarily by green box.

lauraclarke commented 6 years ago

You are correct at the moment that this is setup to contain genome assemblies and gene annotation and mapping is a reasonable label in the specific use case currently being used. I think assemblies are clearly mapping, the gene annotation file is less clear-cut.

By relabelling this file to mapping you are assuming this will never be needed for reference variants for association experiments, reference images for any imaging experiments, reference spectra or molecules for proteomics or metabolomics experiments or reference cell type identities if the annotation of cell identities ever moves within the DCP.

That feels like a short-sighted view and will mean when we do get a different class of reference file we will have to not only extend this schema but either change its name again or create another schema for a different type of reference file. This sounds like it will create more work rather than clarify things for everyone.

JimKent commented 6 years ago

It would mean a schema change (of a non-breaking sort) to add in a new field for a new type of reference file. On the other hand this would also give us the opportunity to clarify what that new type of reference is.....

On Mon, Jul 2, 2018 at 7:13 AM, Laura Clarke notifications@github.com wrote:

You are correct at the moment that this is setup to contain genome assemblies and gene annotation and mapping is a reasonable label in the specific use case currently being used. I think assemblies are clearly mapping, the gene annotation file is less clear-cut.

By relabelling this file to mapping you are assuming this will never be needed for reference variants for association experiments, reference images for any imaging experiments, reference spectra or molecules for proteomics or metabolomics experiments or reference cell type identities if the annotation of cell identities ever moves within the DCP.

That feels like a short-sighted view and will mean when we do get a different class of reference file we will have to not only extend this schema but either change its name again or create another schema for a different type of reference file. This sounds like it will create more work rather than clarify things for everyone.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/HumanCellAtlas/metadata-schema/issues/388#issuecomment-401819019, or mute the thread https://github.com/notifications/unsubscribe-auth/ABo8TBaEJlHxG9c0JoEQpB_FWv78sDsXks5uCiqHgaJpZM4U4O9W .

daniwelter commented 6 years ago

Agreed, it is much easier to add a new (optional) field to an existing type entity than to create an entirely new entity.

daniwelter commented 6 years ago

In fact, reference_file already has a reference_type field, which is an enum, so it will only be a patch update to add additional values to this to capture other types of reference file.

JimKent commented 6 years ago

Ok, reference_file with a reference_type subfield is good enough for me.

On Tue, Jul 3, 2018 at 8:40 AM, daniwelter notifications@github.com wrote:

In fact, reference_file already has a reference_type field, which is an enum, so it will only be a patch update to add additional values to this to capture other types of reference file.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/HumanCellAtlas/metadata-schema/issues/388#issuecomment-402201612, or mute the thread https://github.com/notifications/unsubscribe-auth/ABo8TIlZNNrKbohUz3rzLkm8ZmnmBMB2ks5uC5CAgaJpZM4U4O9W .