airr-community / common-repo-wg

AIRR Community Common Repository Working Group
Apache License 2.0
3 stars 3 forks source link

Merge AIRR 2024 meeting ratification changes #34

Closed bcorrie closed 4 months ago

bcorrie commented 6 months ago

I don't see anything significant to change in this document, it still stands up fairly well. My main questions would be:

Unique digital object identifiers;
A minimal set of searchable metadata;
Physical availability through a cloud-based Commons provider;
Clear access rules and controls; and
An entry (with metadata) in one or more indices.

This list comes from a fairly obscure NIH document as far as I can tell: https://datascience.nih.gov/sites/default/files/CommonsOverviewFrameWorkandCurrentPilots281015_508.pdf

This isn't really citable (at least that I can find), and in todays context I am not sure this is the list we want. In particular the statement "Physical availability through a cloud-based Commons provider" seems overly specific. Is there something more recent? Something that is citable? I can't even tell when this was produced...

schristley commented 6 months ago

Maybe the NIH Strategic Plan for Data Science is better.

bcorrie commented 6 months ago

I wonder about this statement: "AIRR sequences for which epitopes are known should be deposited in the Immune Epitope Database (IEDB)."

Should the Common Repository Working Group be so strict in recommending one and only one repository. We don't do this for sequences (e.g. sat that they should go in SRA and not ENA), recognizing that there are several under the INSDC umbrella.

This may be what we prefer, but is it realistic for this to be so specific? Playing devils advocate a bit here, but...

bcorrie commented 6 months ago

Does anyone else find it ironic that NIH doesn't use PIDs so that its published public documents can be cited??? 8-)

lgcowell commented 6 months ago

I think the difference is the umbrella. My understanding is that SRA and ENA synchronize/mirror one another, so once you submit to one, your data are in both. That is not (currently) the case for the specificity databases.

IEDB is recognized by NIH. I don’t think the others are, but I might be wrong. Not sure about in Europe.

schristley commented 6 months ago

I wonder about this statement: "AIRR sequences for which epitopes are known should be deposited in the Immune Epitope Database (IEDB)."

Should the Common Repository Working Group be so strict in recommending one and only one repository. We don't do this for sequences (e.g. sat that they should go in SRA and not ENA), recognizing that there are several under the INSDC umbrella.

This may be what we prefer, but is it realistic for this to be so specific? Playing devils advocate a bit here, but...

If you are doing NIH-funded research, then yes because they are now requiring data management plans, and IEDB is an NIH-supported data repository for that domain. If you aren't doing NIH-funded research, then I suppose it falls to just a recommendation.

bcorrie commented 6 months ago

Maybe the NIH Strategic Plan for Data Science is better.

I don't see anything in that about compliance like the wording we have now. I would be tempted to take the digital object compliance sentence and list out, and just leave the FAIR principles part in. I would argue that that covers the point we are trying to make - data should be FAIR. Without a citable list of compliance criteria for FAIR data, I suggest we remove this list.

bcorrie commented 6 months ago

If you are doing NIH-funded research, then yes because they are now requiring data management plans, and IEDB is an NIH-supported data repository for that domain. If you aren't doing NIH-funded research, then I suppose it falls to just a recommendation.

Yes, and if you were doing research in the EU and they had a similar repository, they would need to publish the data there and not in IEDB. I would suggest something like the following wording:

"AIRR sequences for which epitopes are known should be deposited in recognized international resources for such data, such as the Immune Epitope Database (IEDB)."

bcorrie commented 6 months ago

Assuming we come up with a process for "accreditation" for repository compliance - which will be used to control which repositories are added to the ADC registry, we probably want to remove this:

Next Step: The AIRR Community will need to work with repositories to establish an accreditation system for compliance with AIRR-C standards.

And add a recommendation to the Repository section.

bcorrie commented 6 months ago

I am going to also suggest we change the numbering system for our recommendations, so we have Section and Subsections.

Section 1: Statement of Principles - AIRR Data Sharing

We can then have:

Recommendation 1.1: Facilitate deposit, access, and use of data

In this way we can add Recommendations to a Section without having to change renumbering of following recommendations outside of that section.

Let me know if there are any objections.

lgcowell commented 6 months ago

Agreed.

bussec commented 6 months ago

My 2 cents:

bcorrie commented 6 months ago

I believe all of the above are done, with the exception of drafting a recommendation around:

Next Step: The AIRR Community will need to work with repositories to establish an accreditation system for compliance with AIRR-C standards.

bcorrie commented 5 months ago

Any other comments on this? If not, my intent will be to merge it after CRWG tomorrow assuming the group approves.

bussec commented 5 months ago

Looks good to me!