airr-knowledge / issues

Issues and project management for the AKC
0 stars 0 forks source link

Generate/document list of 'fields' for each repository #3

Closed schristley closed 8 months ago

schristley commented 9 months ago

Create a spreadsheet of the fields and put in the Google Drive for each repository

bcorrie commented 9 months ago

I have added the ADC fields into the Google Drive. These are generated automatically from the AIRR Spec (https://github.com/airr-community/airr-standards/blob/receptor-fixes/specs/airr-schema.yaml) using the iReceptor airr-flatten utility: https://github.com/sfu-ireceptor/sandbox/tree/master/airr-spec-flatten

Important columns:

The Ontologies and various curies for the AIRR Standard are defined within the AIRR Spec itself, starting here: https://github.com/airr-community/airr-standards/blob/c88eff5800e4efd560faac70d55fbcbf6bbe1371/specs/airr-schema.yaml#L16C1-L16C1

bcorrie commented 9 months ago

This file does not contain the AIRR Spec objects for Germline, although the code could be easily changed to generate a similar file for those objects.

williamdlees commented 9 months ago

I've created VDJbase_schema and OGRDB_schema folders in the Drive. In each of these you will find:

Some notes:

bpeters42 commented 9 months ago

We identified all the information captured in the IEDB beyond receptors. Most of this is captured for both B cells and T cells, but some is T cell only:

https://docs.google.com/spreadsheets/d/1kMmANqAhg2ujURdRnSZBV-R0KO5Rxv7h

I was not sure if this is the right spot to upload?

I suspect that much of the spreadsheet is self explanatory, and another chunk is not - which will be easier to explain on a call.

schristley commented 9 months ago

I've started a spreadsheet with initial field alignment, which now that I've gone through it, looks remarkably like Randi's spreadsheet.

schristley commented 9 months ago

@krishnaroskin @KevinABurns137 If you could, please provide a spreadsheet of fields currently in IRAD.

williamdlees commented 9 months ago

Hi Scott,

I noticed that few of the AIRR schema fields listed in the sheet are mapped across to VDJbase. We do support them.

I’ve uploaded vdjbase_airr_schema_defs.xlsx to the VDJbase folder. I created this when implementing support for them. It lists each field, and what table and fieldname it is mapped to in VDJbase. Column E (‘existing VDJbase attribute’) lists the state before implementation. The implementation is complete now.

Hope this helps.

William

From: Scott Christley @.> Sent: Wednesday, October 4, 2023 9:31 PM To: airr-knowledge/issues @.> Cc: William Lees @.>; Assign @.> Subject: Re: [airr-knowledge/issues] Generate/document list of 'fields' for each repository (Issue #3)

I've started a spreadsheet with initial field alignment https://docs.google.com/spreadsheets/d/19k-CDGVS0BHmsAmSYE4ZR1T3eEj8NLWtWbWr6D8cbPo/edit?usp=sharing .

— Reply to this email directly, view it on GitHub https://github.com/airr-knowledge/issues/issues/3#issuecomment-1747595703 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXBI7K5YSXIVZUFPPGUA43X5XBR3AVCNFSM6AAAAAA5JVAX2OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBXGU4TKNZQGM . You are receiving this because you were assigned.Message ID: @.***>

bpeters42 commented 9 months ago

I assume that Scott meant this as a starting point, and that we will fill in the missing connections. There are plenty (maybe all) of fields in ADC that have a mapping in the IEDB.

I am hoping we can use our next call to go through the entirety of data, and at least agree on a concept level what should be part of the common data format. I am optimistic that we might be able to get down to the field level as well...

On Sat, Oct 7, 2023 at 2:52 AM William Lees @.***> wrote:

Hi Scott,

I noticed that few of the AIRR schema fields listed in the sheet are mapped across to VDJbase. We do support them.

I’ve uploaded vdjbase_airr_schema_defs.xlsx to the VDJbase folder. I created this when implementing support for them. It lists each field, and what table and fieldname it is mapped to in VDJbase. Column E (‘existing VDJbase attribute’) lists the state before implementation. The implementation is complete now.

Hope this helps.

William

From: Scott Christley @.> Sent: Wednesday, October 4, 2023 9:31 PM To: airr-knowledge/issues @.> Cc: William Lees @.>; Assign @.> Subject: Re: [airr-knowledge/issues] Generate/document list of 'fields' for each repository (Issue #3)

I've started a spreadsheet with initial field alignment < https://docs.google.com/spreadsheets/d/19k-CDGVS0BHmsAmSYE4ZR1T3eEj8NLWtWbWr6D8cbPo/edit?usp=sharing> .

— Reply to this email directly, view it on GitHub < https://github.com/airr-knowledge/issues/issues/3#issuecomment-1747595703> , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ACXBI7K5YSXIVZUFPPGUA43X5XBR3AVCNFSM6AAAAAA5JVAX2OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBXGU4TKNZQGM> . You are receiving this because you were assigned.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/airr-knowledge/issues/issues/3#issuecomment-1751668531, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJX2IQOKG7S4W2S3FPYDKDX6EQ7TAVCNFSM6AAAAAA5JVAX2OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONJRGY3DQNJTGE . You are receiving this because you were assigned.Message ID: @.***>

-- Bjoern Peters Professor La Jolla Institute for Immunology 9420 Athena Circle La Jolla, CA 92037, USA Tel: 858/752-6914 Fax: 858/752-6987 http://www.liai.org/pages/faculty-peters

schristley commented 9 months ago

Hi Scott, I noticed that few of the AIRR schema fields listed in the sheet are mapped across to VDJbase. We do support them. I’ve uploaded vdjbase_airr_schema_defs.xlsx to the VDJbase folder. I created this when implementing support for them. It lists each field, and what table and fieldname it is mapped to in VDJbase. Column E (‘existing VDJbase attribute’) lists the state before implementation. The implementation is complete now. Hope this helps. William

Hi William, I noticed that VDJbase seems to have almost exactly the AIRR Repertoire, so instead of listing out all of the fields, I just wrote a short-hand notation. Lines 6, 7, 8, essentially means that all the study, subject and sample fields are shared.

schristley commented 9 months ago

I assume that Scott meant this as a starting point, and that we will fill in the missing connections. There are plenty (maybe all) of fields in ADC that have a mapping in the IEDB.

Actually, I would be surprised if many ADC fields had a mapping to IEDB. I expect it to be the opposite. The large majority of fields are unique within each repository, for ADC, much of Repertoire is the AIRR-seq protocol and much of Rearrangements are the sequence annotations, neither which I'd expect to have a mapping to a field in IEDB.

The spreadsheet is missing the mapping for Rearrangements, but I think that's only a few fields too.

I am hoping we can use our next call to go through the entirety of data, and at least agree on a concept level what should be part of the common data format. I am optimistic that we might be able to get down to the field level as well... - Bjoern

Yes I agree. My sense is that it will come down to a few concepts, common things like study/publication, subject/host, sample, sample processing/assay, data processing, chain and receptor, while each repository will also have a set of unique ones. The Object sheet starts listing some, and I looked for ontology terms that will ground them. I've found many in OBI which is good. It's a bit early in our process to connect them into a knowledge graph, but that's how we'll want to eventually formalize these concepts/objects for the AK.

krishnaroskin commented 9 months ago

@krishnaroskin @KevinABurns137 If you could, please provide a spreadsheet of fields currently in IRAD.

I've uploaded the list of IRAD fields to the Google Drive:

https://docs.google.com/spreadsheets/d/1tDIrgzoHpJWVLXw21VAu_E928bk7E4Xx/edit?usp=drive_link&ouid=104383729183383279152&rtpof=true&sd=true

These are still a work in progresses as we're still working on systematizing our curation processes. All feedback welcome.

schristley commented 8 months ago

complete