Open david4096 opened 6 years ago
The Helium folks have a f2f meeting so we may not be able to make it (the time is 12pm ET).
cc @scox @balhoff @putmantime @deepakunni3
@david4096 To follow up on Chris' comment, is there a Zoom/Webex URL for the meeting on July 12?
@cmungall would an hour earlier work better? There's a doodle here: https://doodle.com/poll/4mtaeps9kp2pnqzh
@deepakunni3 Currently have a hangouts link https://meet.google.com/qce-kmjd-ugc .
Send me your email if you didn't get an invite! :) davidcs [at] ucsc . edu
our meeting is all day, but I think we can keep the original time and have those of us involved in KC7 step out for this call
A list of the existing DATS files that are available for indexing tests?
![Uploading Screenshot_2018-07-12_09-09-41.png…]()
Kirk, Team Oxygen is centered around serializing dataset metadata, not individual file metadata.
Adrienne, TopMed: inconsistency between datasets makes indexing files difficult.
Alejandra: https://github.com/dcppc/crosscut-metadata/tree/master/dats-json-examples , showed a DATS querying example, will add here.
Made new jsonld contexts.
Nemanja: Working with Charlotte where they are doing global dataset indexing (level 1 way of searching). SevenBridges is focused on level 2, using phenotype (and maybe genotype information) find files! Started working recently with DATS.
Uses a triple store.
Philippe Rocca-Serra 9:30 AM also from our group (phosphorus) we'd need to know how much description about the file content would need to be added , how much file introspection would be required . can we have a sense of what key use cases are currently being considered?
Checksum, checksum algorithm, urls, size
Jared: Data model in database, any files loaded go to the database, two releases a year, uses mongodb
Adrienne: SQL database that models the phenotype structure, files go into an EAV table since they're not really harmonized. Wanted relational to retain links to original file.
Anup: What is a dataset? How do we properly model?
DATS examples: https://github.com/dcppc/crosscut-metadata/tree/master/dats-json
The ETL pipeline is also available in the same repo: https://github.com/dcppc/crosscut-metadata
More DATS examples are at: https://github.com/datatagsuite/examples In particular see: https://github.com/datatagsuite/examples/blob/master/BDbag-AGR-example.json
Documentation about DATS can be found here: https://datatagsuite.github.io/docs/html/dats.html
This issue is meant to capture the meeting notes and conversation that will take place 9am PT and 12pm ET, July 12, 2018.
https://meet.google.com/qce-kmjd-ugc
Here is a draft agenda, please add links/notes to this thread as you see fit!