Closed mellybelly closed 5 years ago
Curation would be done working with N-Lighten team.
After call with CLIC/NCATS - we will prioritize current CLIC EDU warehouse materials for curation
Do you have a link to their warehouse?
Bill
From: Shannon McWeeney notifications@github.com Reply-To: data2health/edu-harmonization reply@reply.github.com Date: Thursday, May 2, 2019 at 6:47 AM To: data2health/edu-harmonization edu-harmonization@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [data2health/edu-harmonization] Curate initial round of CTSA "gems" as examples to assess metadata/ontology (#2)
After call with CLIC/NCATS - we will prioritize current CLIC EDU warehouse materials for curation
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/data2health/edu-harmonization/issues/2#issuecomment-488679981, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AD2HHPHJWTMXHVJMXU3QVZDPTLWHFANCNFSM4G33KRCA.
@williamhersh see https://clic-ctsa.org/education
Can we get a dump of the metadata?
From: Shannon McWeeney notifications@github.com Reply-To: data2health/edu-harmonization reply@reply.github.com Date: Thursday, May 2, 2019 at 8:32 AM To: data2health/edu-harmonization edu-harmonization@noreply.github.com Cc: William Hersh hersh@ohsu.edu, Mention mention@noreply.github.com Subject: Re: [data2health/edu-harmonization] Curate initial round of CTSA "gems" as examples to assess metadata/ontology (#2)
@williamhershhttps://github.com/williamhersh see https://clic-ctsa.org/education
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/data2health/edu-harmonization/issues/2#issuecomment-488720218, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AD2HHPCPWOYZ7HVI532LHJ3PTMCQRANCNFSM4G33KRCA.
CREATE TABLE clic.resource ( id integer NOT NULL, url text, title text, description text, objective text, institution text, method text, frequency text, fee text );
CREATE TABLE clic.competency ( id integer NOT NULL, seqnum integer NOT NULL, competency text );
CREATE TABLE clic.target ( id integer NOT NULL, seqnum integer NOT NULL, target text );
CREATE TABLE clic.tag ( id integer NOT NULL, seqnum integer NOT NULL, tag text );
`loki=# select competency,count(*) from competency group by 1 order by 2 desc,1; ┌───────────────────────────────────────────┬───────┐ │ competency │ count │ ├───────────────────────────────────────────┼───────┤ │ Study Design │ 17 │ │ Regulations & Compliance │ 15 │ │ Ethics & Safety │ 14 │ │ Implementation │ 14 │ │ Statistics and Informatics │ 11 │ │ Team Science │ 9 │ │ Communication │ 8 │ │ Leadership │ 8 │ │ Community Engagement & Cultural Diversity │ 7 │ │ Other │ 7 │ │ Technology & Innovation │ 7 │ │ Grant Writing │ 6 │ └───────────────────────────────────────────┴───────┘ (12 rows)
loki=# select tag,count(*) from tag group by 1 order by 2 desc,1; ┌─────────────────────────────────┬───────┐ │ tag │ count │ ├─────────────────────────────────┼───────┤ │ mentoring │ 4 │ │ mentor │ 3 │ │ Education │ 2 │ │ Human Subjects Protection │ 2 │ │ Workforce Development │ 2 │ │ clinical research professionals │ 2 │ │ Bibliometrics │ 1 │ │ Biostatistics │ 1 │ │ CME │ 1 │ │ CNE │ 1 │ │ DIAMOND │ 1 │ │ Genetic Testing │ 1 │ │ Good Clinical Practice (GCP) │ 1 │ │ Informetrics │ 1 │ │ Leadership │ 1 │ │ MOOC │ 1 │ │ Monitor │ 1 │ │ Monitoring │ 1 │ │ Project Management │ 1 │ │ Quality Control │ 1 │ │ Safety │ 1 │ │ Scientometrics │ 1 │ │ clinical trial design │ 1 │ │ data analysis │ 1 │ │ e-Portfolio │ 1 │ │ mentee │ 1 │ │ training │ 1 │ │ translational research │ 1 │ │ trial reporting │ 1 │ └─────────────────────────────────┴───────┘ (29 rows)
loki=# select target,count(*) from target group by 1 order by 2 desc,1; ┌─────────────────────────────────┬───────┐ │ target │ count │ ├─────────────────────────────────┼───────┤ │ Clinical Research Professionals │ 24 │ │ Graduate Students │ 21 │ │ Postdoctoral Scholars │ 21 │ │ Principal Investigators │ 20 │ │ Researchers │ 19 │ │ Health Care Professionals │ 13 │ │ Undergraduate Students │ 11 │ │ General Public │ 10 │ │ Community Partners │ 8 │ │ Other │ 1 │ └─────────────────────────────────┴───────┘ (10 rows)
loki=# `
Harvester built and harvesting completed. The three value sets above are typically what I've been using to create filter facets in the discovery engine.
Any chance to get a file (spreadsheet?) that has the metadata for each record?
Bill
From: Dave Eichmann notifications@github.com Reply-To: data2health/edu-harmonization reply@reply.github.com Date: Friday, May 3, 2019 at 12:42 PM To: data2health/edu-harmonization edu-harmonization@noreply.github.com Cc: William Hersh hersh@ohsu.edu, Mention mention@noreply.github.com Subject: Re: [data2health/edu-harmonization] Curate initial round of CTSA "gems" as examples to assess metadata/ontology (#2)
Harvester built and harvesting completed. The three value sets above are typically what I've been using to create filter facets in the discovery engine.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/data2health/edu-harmonization/issues/2#issuecomment-489216646, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AD2HHPDETTWKSD7X32LICTLPTSIQXANCNFSM4G33KRCA.
Perfect, thanks!
Next question: Can we or should we harvest Erudite?
Bill
From: Dave Eichmann notifications@github.com Reply-To: data2health/edu-harmonization reply@reply.github.com Date: Friday, May 3, 2019 at 12:52 PM To: data2health/edu-harmonization edu-harmonization@noreply.github.com Cc: William Hersh hersh@ohsu.edu, Mention mention@noreply.github.com Subject: Re: [data2health/edu-harmonization] Curate initial round of CTSA "gems" as examples to assess metadata/ontology (#2)
Here's the main table. clic.xlsxhttps://github.com/data2health/edu-harmonization/files/3143372/clic.xlsx
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/data2health/edu-harmonization/issues/2#issuecomment-489219673, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AD2HHPDAO5T5I5KNYK5Y4XDPTSJXNANCNFSM4G33KRCA.
I have 6138 HTML pages for Erudite cached. Just haven't worked on extracting content as yet from the raw HTML.
Sorry, make that 6136 pages...
Oh, yeah. Now I remember - there's a linked data Zenodo release at https://zenodo.org/record/2553478#.XMyfXi-ZN24
referenced off of here: https://bioint.github.io/erudite-training-resource-standard/
Sample creative work record:
bdu-resource:10003417681137327691 a schema:CreativeWork ; schema:author bdu-person:Xilin_Chen ; schema:genre dseo:artificial_intelligence, dseo:image_data, dseo:video ; schema:name "Hierarchical Hybrid Statistic based Video Binary Code and Its Application to Face Retrieval in TV-Series" ; schema:provider bdu-organization:Chinese_Academy_of_Sciences, bdu-organization:VideoLectures%2Enet ; schema:url "http://videolectures.net/fgconference2015_chen_face_retrieval/" .
Thanks Dave, have you harvested the pages linked from these sites?
OHSU BD2K modules: https://github.com/OHSUBD2K
ONC Health IT curriculum: https://www.healthit.gov/topic/health-it-resources/health-it-curriculum-resources-educators
Once we have all of these, what are our next steps for “harmonizing” the metadata? Should we set up a call?
Bill
From: Dave Eichmann notifications@github.com Reply-To: data2health/edu-harmonization reply@reply.github.com Date: Friday, May 3, 2019 at 12:52 PM To: data2health/edu-harmonization edu-harmonization@noreply.github.com Cc: William Hersh hersh@ohsu.edu, Mention mention@noreply.github.com Subject: Re: [data2health/edu-harmonization] Curate initial round of CTSA "gems" as examples to assess metadata/ontology (#2)
Here's the main table. clic.xlsxhttps://github.com/data2health/edu-harmonization/files/3143372/clic.xlsx
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/data2health/edu-harmonization/issues/2#issuecomment-489219673, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AD2HHPDAO5T5I5KNYK5Y4XDPTSJXNANCNFSM4G33KRCA.
Bill,
Yes on the BD2K modules - they were harvested as part of our initial exploration of relevant GitHub repositories. Content is limited to the repo metadata (including the README), but hits do show up in our search interface on various topics.
No on the ONC material (as yet). The number of artifacts here isn’t huge, but it appears that we would have to hand-craft some form of metadata markup, as its all Word docs, Powerpoints and videos.
Both of these sources raise the interesting question of the utility of mining actual artifacts (e.g., slide decks) for indexing text. The full set of sources (including DIAMOND, N-Lighten, ERuDIte) have varying forms of metadata - target audience, competency, etc. - that may or may not match another site’s usage, nomenclature, etc. That’s where much of the harmonization needs to be done.
On May 6, 2019, at 7:40 AM, William Hersh notifications@github.com wrote:
Thanks Dave, have you harvested the pages linked from these sites?
OHSU BD2K modules: https://github.com/OHSUBD2K
ONC Health IT curriculum: https://www.healthit.gov/topic/health-it-resources/health-it-curriculum-resources-educators
Once we have all of these, what are our next steps for “harmonizing” the metadata? Should we set up a call?
Bill
From: Dave Eichmann notifications@github.com Reply-To: data2health/edu-harmonization reply@reply.github.com Date: Friday, May 3, 2019 at 12:52 PM To: data2health/edu-harmonization edu-harmonization@noreply.github.com Cc: William Hersh hersh@ohsu.edu, Mention mention@noreply.github.com Subject: Re: [data2health/edu-harmonization] Curate initial round of CTSA "gems" as examples to assess metadata/ontology (#2)
Here's the main table. clic.xlsxhttps://github.com/data2health/edu-harmonization/files/3143372/clic.xlsx
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/data2health/edu-harmonization/issues/2#issuecomment-489219673, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AD2HHPDAO5T5I5KNYK5Y4XDPTSJXNANCNFSM4G33KRCA. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/data2health/edu-harmonization/issues/2#issuecomment-489605649, or mute the thread https://github.com/notifications/unsubscribe-auth/ACT5BO54BUGIZFUTTALVIIDPUARKBANCNFSM4G33KRCA.
Dave,
The links on the ONC page are to .zip files that have the entire components or the individual units within each one. There is also a link to the teaching guide for each component. Could you not include those without having to drill into the PPTs, Word docs, etc.?
Bill
From: Dave Eichmann notifications@github.com Reply-To: data2health/edu-harmonization reply@reply.github.com Date: Monday, May 6, 2019 at 7:54 AM To: data2health/edu-harmonization edu-harmonization@noreply.github.com Cc: William Hersh hersh@ohsu.edu, Mention mention@noreply.github.com Subject: Re: [data2health/edu-harmonization] Curate initial round of CTSA "gems" as examples to assess metadata/ontology (#2)
Bill,
Yes on the BD2K modules - they were harvested as part of our initial exploration of relevant GitHub repositories. Content is limited to the repo metadata (including the README), but hits do show up in our search interface on various topics.
No on the ONC material (as yet). The number of artifacts here isn’t huge, but it appears that we would have to hand-craft some form of metadata markup, as its all Word docs, Powerpoints and videos.
Both of these sources raise the interesting question of the utility of mining actual artifacts (e.g., slide decks) for indexing text. The full set of sources (including DIAMOND, N-Lighten, ERuDIte) have varying forms of metadata - target audience, competency, etc. - that may or may not match another site’s usage, nomenclature, etc. That’s where much of the harmonization needs to be done.
On May 6, 2019, at 7:40 AM, William Hersh notifications@github.com wrote:
Thanks Dave, have you harvested the pages linked from these sites?
OHSU BD2K modules: https://github.com/OHSUBD2K
ONC Health IT curriculum: https://www.healthit.gov/topic/health-it-resources/health-it-curriculum-resources-educators
Once we have all of these, what are our next steps for “harmonizing” the metadata? Should we set up a call?
Bill
From: Dave Eichmann notifications@github.com Reply-To: data2health/edu-harmonization reply@reply.github.com Date: Friday, May 3, 2019 at 12:52 PM To: data2health/edu-harmonization edu-harmonization@noreply.github.com Cc: William Hersh hersh@ohsu.edu, Mention mention@noreply.github.com Subject: Re: [data2health/edu-harmonization] Curate initial round of CTSA "gems" as examples to assess metadata/ontology (#2)
Here's the main table. clic.xlsxhttps://github.com/data2health/edu-harmonization/files/3143372/clic.xlsx
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/data2health/edu-harmonization/issues/2#issuecomment-489219673, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AD2HHPDAO5T5I5KNYK5Y4XDPTSJXNANCNFSM4G33KRCA. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/data2health/edu-harmonization/issues/2#issuecomment-489605649, or mute the thread https://github.com/notifications/unsubscribe-auth/ACT5BO54BUGIZFUTTALVIIDPUARKBANCNFSM4G33KRCA.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/data2health/edu-harmonization/issues/2#issuecomment-489650977, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AD2HHPBHKEJI635GZ7BBKLDPUBBDBANCNFSM4G33KRCA.
@williamhersh to clarify a bit - except for the main page, everything's either an overview Word doc (middle column on that page) or a pointer to a Zip file (right column on that page). Having the harvester snag the overview doc, or drill into the zips for detail docs, is easy. But we only have one link for a person to land on when clicking on a hit, unless we point them straight at one of the docs, for example. That would give them no context from which to proceed. Hence I'm assuming for this that we would want to craft a metadata record for each resource and a corresponding transitional landing page for a hit in our space giving the user context and path(s) forward to actual artifacts.
Dave, we could put an un-zipped version into Github that could provide context there? (The materials are public domain.)
From: Dave Eichmann notifications@github.com Reply-To: data2health/edu-harmonization reply@reply.github.com Date: Monday, May 6, 2019 at 8:53 AM To: data2health/edu-harmonization edu-harmonization@noreply.github.com Cc: William Hersh hersh@ohsu.edu, Mention mention@noreply.github.com Subject: Re: [data2health/edu-harmonization] Curate initial round of CTSA "gems" as examples to assess metadata/ontology (#2)
@williamhershhttps://github.com/williamhersh to clarify a bit - except for the main page, everything's either an overview Word doc (middle column on that page) or a pointer to a Zip file (right column on that page). Having the harvester snag the overview doc, or drill into the zips for detail docs, is easy. But we only have one link for a person to land on when clicking on a hit, unless we point them straight at one of the docs, for example. That would give them no context from which to proceed. Hence I'm assuming for this that we would want to craft a metadata record for each resource and a corresponding transitional landing page for a hit in our space giving the user context and path(s) forward to actual artifacts.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/data2health/edu-harmonization/issues/2#issuecomment-489672255, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AD2HHPAGNYQEJH7CY2NKYX3PUBH7VANCNFSM4G33KRCA.
@eichmann, @williamhersh and @wondermixtape - is my action item to add the resources listed in the clic.xlsx file to N-lighten?
Curation of exemplar resources with ontology and metadata elements 1st draft completed
Curation of CTSA "gems" from each CTSA will help improve engagement, will support requirements analysis for the ontologies, search and attribution, assist in development of more refined evaluation metrics and will ultimately make some of the most valuable resources more discoverable. Steps: A. Community outreach to assist in landscape analysis B. Initial Exemplar “jewels” requested / socializing of idea of CTSA jewels C. . Assessment of educational resources with respect to metadata, discovery and ease of re-use