Closed schristley closed 8 months ago
I have added the ADC fields into the Google Drive. These are generated automatically from the AIRR Spec (https://github.com/airr-community/airr-standards/blob/receptor-fixes/specs/airr-schema.yaml) using the iReceptor airr-flatten utility: https://github.com/sfu-ireceptor/sandbox/tree/master/airr-spec-flatten
Important columns:
The Ontologies and various curies for the AIRR Standard are defined within the AIRR Spec itself, starting here: https://github.com/airr-community/airr-standards/blob/c88eff5800e4efd560faac70d55fbcbf6bbe1371/specs/airr-schema.yaml#L16C1-L16C1
This file does not contain the AIRR Spec objects for Germline, although the code could be easily changed to generate a similar file for those objects.
I've created VDJbase_schema
and OGRDB_schema
folders in the Drive. In each of these you will find:
Some notes:
We identified all the information captured in the IEDB beyond receptors. Most of this is captured for both B cells and T cells, but some is T cell only:
https://docs.google.com/spreadsheets/d/1kMmANqAhg2ujURdRnSZBV-R0KO5Rxv7h
I was not sure if this is the right spot to upload?
I suspect that much of the spreadsheet is self explanatory, and another chunk is not - which will be easier to explain on a call.
I've started a spreadsheet with initial field alignment, which now that I've gone through it, looks remarkably like Randi's spreadsheet.
@krishnaroskin @KevinABurns137 If you could, please provide a spreadsheet of fields currently in IRAD.
Hi Scott,
I noticed that few of the AIRR schema fields listed in the sheet are mapped across to VDJbase. We do support them.
I’ve uploaded vdjbase_airr_schema_defs.xlsx to the VDJbase folder. I created this when implementing support for them. It lists each field, and what table and fieldname it is mapped to in VDJbase. Column E (‘existing VDJbase attribute’) lists the state before implementation. The implementation is complete now.
Hope this helps.
William
From: Scott Christley @.> Sent: Wednesday, October 4, 2023 9:31 PM To: airr-knowledge/issues @.> Cc: William Lees @.>; Assign @.> Subject: Re: [airr-knowledge/issues] Generate/document list of 'fields' for each repository (Issue #3)
I've started a spreadsheet with initial field alignment https://docs.google.com/spreadsheets/d/19k-CDGVS0BHmsAmSYE4ZR1T3eEj8NLWtWbWr6D8cbPo/edit?usp=sharing .
— Reply to this email directly, view it on GitHub https://github.com/airr-knowledge/issues/issues/3#issuecomment-1747595703 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXBI7K5YSXIVZUFPPGUA43X5XBR3AVCNFSM6AAAAAA5JVAX2OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBXGU4TKNZQGM . You are receiving this because you were assigned.Message ID: @.***>
I assume that Scott meant this as a starting point, and that we will fill in the missing connections. There are plenty (maybe all) of fields in ADC that have a mapping in the IEDB.
I am hoping we can use our next call to go through the entirety of data, and at least agree on a concept level what should be part of the common data format. I am optimistic that we might be able to get down to the field level as well...
On Sat, Oct 7, 2023 at 2:52 AM William Lees @.***> wrote:
Hi Scott,
I noticed that few of the AIRR schema fields listed in the sheet are mapped across to VDJbase. We do support them.
I’ve uploaded vdjbase_airr_schema_defs.xlsx to the VDJbase folder. I created this when implementing support for them. It lists each field, and what table and fieldname it is mapped to in VDJbase. Column E (‘existing VDJbase attribute’) lists the state before implementation. The implementation is complete now.
Hope this helps.
William
From: Scott Christley @.> Sent: Wednesday, October 4, 2023 9:31 PM To: airr-knowledge/issues @.> Cc: William Lees @.>; Assign @.> Subject: Re: [airr-knowledge/issues] Generate/document list of 'fields' for each repository (Issue #3)
I've started a spreadsheet with initial field alignment < https://docs.google.com/spreadsheets/d/19k-CDGVS0BHmsAmSYE4ZR1T3eEj8NLWtWbWr6D8cbPo/edit?usp=sharing> .
— Reply to this email directly, view it on GitHub < https://github.com/airr-knowledge/issues/issues/3#issuecomment-1747595703> , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ACXBI7K5YSXIVZUFPPGUA43X5XBR3AVCNFSM6AAAAAA5JVAX2OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBXGU4TKNZQGM> . You are receiving this because you were assigned.Message ID: @.***>
— Reply to this email directly, view it on GitHub https://github.com/airr-knowledge/issues/issues/3#issuecomment-1751668531, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJX2IQOKG7S4W2S3FPYDKDX6EQ7TAVCNFSM6AAAAAA5JVAX2OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONJRGY3DQNJTGE . You are receiving this because you were assigned.Message ID: @.***>
-- Bjoern Peters Professor La Jolla Institute for Immunology 9420 Athena Circle La Jolla, CA 92037, USA Tel: 858/752-6914 Fax: 858/752-6987 http://www.liai.org/pages/faculty-peters
Hi Scott, I noticed that few of the AIRR schema fields listed in the sheet are mapped across to VDJbase. We do support them. I’ve uploaded vdjbase_airr_schema_defs.xlsx to the VDJbase folder. I created this when implementing support for them. It lists each field, and what table and fieldname it is mapped to in VDJbase. Column E (‘existing VDJbase attribute’) lists the state before implementation. The implementation is complete now. Hope this helps. William
Hi William, I noticed that VDJbase seems to have almost exactly the AIRR Repertoire, so instead of listing out all of the fields, I just wrote a short-hand notation. Lines 6, 7, 8, essentially means that all the study, subject and sample fields are shared.
I assume that Scott meant this as a starting point, and that we will fill in the missing connections. There are plenty (maybe all) of fields in ADC that have a mapping in the IEDB.
Actually, I would be surprised if many ADC fields had a mapping to IEDB. I expect it to be the opposite. The large majority of fields are unique within each repository, for ADC, much of Repertoire is the AIRR-seq protocol and much of Rearrangements are the sequence annotations, neither which I'd expect to have a mapping to a field in IEDB.
The spreadsheet is missing the mapping for Rearrangements, but I think that's only a few fields too.
I am hoping we can use our next call to go through the entirety of data, and at least agree on a concept level what should be part of the common data format. I am optimistic that we might be able to get down to the field level as well... - Bjoern
Yes I agree. My sense is that it will come down to a few concepts, common things like study/publication, subject/host, sample, sample processing/assay, data processing, chain and receptor, while each repository will also have a set of unique ones. The Object sheet starts listing some, and I looked for ontology terms that will ground them. I've found many in OBI which is good. It's a bit early in our process to connect them into a knowledge graph, but that's how we'll want to eventually formalize these concepts/objects for the AK.
@krishnaroskin @KevinABurns137 If you could, please provide a spreadsheet of fields currently in IRAD.
I've uploaded the list of IRAD fields to the Google Drive:
These are still a work in progresses as we're still working on systematizing our curation processes. All feedback welcome.
complete
Create a spreadsheet of the fields and put in the Google Drive for each repository