cancerDHC / operations

for operational functions
1 stars 1 forks source link

Deep dive into the Portable Format for Biomedical (PFB) #62

Closed gaurav closed 4 years ago

gaurav commented 4 years ago

We need to do a deep dive into the Portable Format for Biomedical (PFB) and figure out a few things:

Brief description of PFB: PFB is an Avro-based serialization format with a specific schema to import, export and evolve biomedical data. PFB specifies metadata and data in one file. Metadata includes data dictionary, ontology references & relations between nodes. It is part of Gen3 (an open source software platform for developing and operating data commons). It is primarily used to move large chunks of data and metadata from one of their supported systems/commons to another. It also seemed useful for saving ‘snapshots’ in time. See also this article.

Some resources on PFB:

gaurav commented 4 years ago

I've written up a description of PFB in our shared Google Drive and added it to the CCDH Tools Landscape document -- have a look and let me know if you have any outstanding questions! I'll keep looking for other formats that could serve the same purpose as PFB. If there are no follow-up questions in a week or so, I think it'll be time to close this issue.

TomConlin commented 4 years ago

+1.

I worry a bit about the absolute v.s. relative clarity of:

 "tobacco_smoking_onset_year": 76,
      "tobacco_smoking_quit_year": 63,
gaurav commented 4 years ago

Tom: those terms are coming from caDSR 2228604 and caDSR 2228610 respectively, as per the metadata included in the Avro file. I think users can pick other terms if they prefer?

On Slack, @TomConlin pointed me to OMOP as another potential data exchange format, but I'm not sure if it includes a format or just a common data model (in which case, the CCDH Harmonized Model will provide a common data model better suited for our needs?). Apart from that, I couldn't find any other generic biomedical data exchange formats, so I think we're ready to close this issue unless anybody else has anything to add?

gaurav commented 4 years ago

At our last internal meeting (June 3, 2020), we decided that this level of detail was good enough for now. We can reopen this issue when a need for PFB arises in our project.