data2health / edu-harmonization

Educational resource and competency harmonization project
3 stars 1 forks source link

Curate initial round of CTSA "gems" as examples to assess metadata/ontology #2

Closed mellybelly closed 5 years ago

mellybelly commented 5 years ago

Curation of CTSA "gems" from each CTSA will help improve engagement, will support requirements analysis for the ontologies, search and attribution, assist in development of more refined evaluation metrics and will ultimately make some of the most valuable resources more discoverable. Steps: A. Community outreach to assist in landscape analysis B. Initial Exemplar “jewels” requested / socializing of idea of CTSA jewels C. . Assessment of educational resources with respect to metadata, discovery and ease of re-use

nicolevasilevsky commented 5 years ago

Curation would be done working with N-Lighten team.

wondermixtape commented 5 years ago

After call with CLIC/NCATS - we will prioritize current CLIC EDU warehouse materials for curation

williamhersh commented 5 years ago

Do you have a link to their warehouse?

Bill

From: Shannon McWeeney notifications@github.com Reply-To: data2health/edu-harmonization reply@reply.github.com Date: Thursday, May 2, 2019 at 6:47 AM To: data2health/edu-harmonization edu-harmonization@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [data2health/edu-harmonization] Curate initial round of CTSA "gems" as examples to assess metadata/ontology (#2)

After call with CLIC/NCATS - we will prioritize current CLIC EDU warehouse materials for curation

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/data2health/edu-harmonization/issues/2#issuecomment-488679981, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AD2HHPHJWTMXHVJMXU3QVZDPTLWHFANCNFSM4G33KRCA.

wondermixtape commented 5 years ago

@williamhersh see https://clic-ctsa.org/education

williamhersh commented 5 years ago

Can we get a dump of the metadata?

From: Shannon McWeeney notifications@github.com Reply-To: data2health/edu-harmonization reply@reply.github.com Date: Thursday, May 2, 2019 at 8:32 AM To: data2health/edu-harmonization edu-harmonization@noreply.github.com Cc: William Hersh hersh@ohsu.edu, Mention mention@noreply.github.com Subject: Re: [data2health/edu-harmonization] Curate initial round of CTSA "gems" as examples to assess metadata/ontology (#2)

@williamhershhttps://github.com/williamhersh see https://clic-ctsa.org/education

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/data2health/edu-harmonization/issues/2#issuecomment-488720218, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AD2HHPCPWOYZ7HVI532LHJ3PTMCQRANCNFSM4G33KRCA.

eichmann commented 5 years ago

CREATE TABLE clic.resource ( id integer NOT NULL, url text, title text, description text, objective text, institution text, method text, frequency text, fee text );

CREATE TABLE clic.competency ( id integer NOT NULL, seqnum integer NOT NULL, competency text );

CREATE TABLE clic.target ( id integer NOT NULL, seqnum integer NOT NULL, target text );

CREATE TABLE clic.tag ( id integer NOT NULL, seqnum integer NOT NULL, tag text );

eichmann commented 5 years ago

`loki=# select competency,count(*) from competency group by 1 order by 2 desc,1; ┌───────────────────────────────────────────┬───────┐ │ competency │ count │ ├───────────────────────────────────────────┼───────┤ │ Study Design │ 17 │ │ Regulations & Compliance │ 15 │ │ Ethics & Safety │ 14 │ │ Implementation │ 14 │ │ Statistics and Informatics │ 11 │ │ Team Science │ 9 │ │ Communication │ 8 │ │ Leadership │ 8 │ │ Community Engagement & Cultural Diversity │ 7 │ │ Other │ 7 │ │ Technology & Innovation │ 7 │ │ Grant Writing │ 6 │ └───────────────────────────────────────────┴───────┘ (12 rows)

loki=# select tag,count(*) from tag group by 1 order by 2 desc,1; ┌─────────────────────────────────┬───────┐ │ tag │ count │ ├─────────────────────────────────┼───────┤ │ mentoring │ 4 │ │ mentor │ 3 │ │ Education │ 2 │ │ Human Subjects Protection │ 2 │ │ Workforce Development │ 2 │ │ clinical research professionals │ 2 │ │ Bibliometrics │ 1 │ │ Biostatistics │ 1 │ │ CME │ 1 │ │ CNE │ 1 │ │ DIAMOND │ 1 │ │ Genetic Testing │ 1 │ │ Good Clinical Practice (GCP) │ 1 │ │ Informetrics │ 1 │ │ Leadership │ 1 │ │ MOOC │ 1 │ │ Monitor │ 1 │ │ Monitoring │ 1 │ │ Project Management │ 1 │ │ Quality Control │ 1 │ │ Safety │ 1 │ │ Scientometrics │ 1 │ │ clinical trial design │ 1 │ │ data analysis │ 1 │ │ e-Portfolio │ 1 │ │ mentee │ 1 │ │ training │ 1 │ │ translational research │ 1 │ │ trial reporting │ 1 │ └─────────────────────────────────┴───────┘ (29 rows)

loki=# select target,count(*) from target group by 1 order by 2 desc,1; ┌─────────────────────────────────┬───────┐ │ target │ count │ ├─────────────────────────────────┼───────┤ │ Clinical Research Professionals │ 24 │ │ Graduate Students │ 21 │ │ Postdoctoral Scholars │ 21 │ │ Principal Investigators │ 20 │ │ Researchers │ 19 │ │ Health Care Professionals │ 13 │ │ Undergraduate Students │ 11 │ │ General Public │ 10 │ │ Community Partners │ 8 │ │ Other │ 1 │ └─────────────────────────────────┴───────┘ (10 rows)

loki=# `

eichmann commented 5 years ago

Harvester built and harvesting completed. The three value sets above are typically what I've been using to create filter facets in the discovery engine.

williamhersh commented 5 years ago

Any chance to get a file (spreadsheet?) that has the metadata for each record?

Bill

From: Dave Eichmann notifications@github.com Reply-To: data2health/edu-harmonization reply@reply.github.com Date: Friday, May 3, 2019 at 12:42 PM To: data2health/edu-harmonization edu-harmonization@noreply.github.com Cc: William Hersh hersh@ohsu.edu, Mention mention@noreply.github.com Subject: Re: [data2health/edu-harmonization] Curate initial round of CTSA "gems" as examples to assess metadata/ontology (#2)

Harvester built and harvesting completed. The three value sets above are typically what I've been using to create filter facets in the discovery engine.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/data2health/edu-harmonization/issues/2#issuecomment-489216646, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AD2HHPDETTWKSD7X32LICTLPTSIQXANCNFSM4G33KRCA.

eichmann commented 5 years ago

Here's the main table. clic.xlsx

williamhersh commented 5 years ago

Perfect, thanks!

Next question: Can we or should we harvest Erudite?

Bill

From: Dave Eichmann notifications@github.com Reply-To: data2health/edu-harmonization reply@reply.github.com Date: Friday, May 3, 2019 at 12:52 PM To: data2health/edu-harmonization edu-harmonization@noreply.github.com Cc: William Hersh hersh@ohsu.edu, Mention mention@noreply.github.com Subject: Re: [data2health/edu-harmonization] Curate initial round of CTSA "gems" as examples to assess metadata/ontology (#2)

Here's the main table. clic.xlsxhttps://github.com/data2health/edu-harmonization/files/3143372/clic.xlsx

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/data2health/edu-harmonization/issues/2#issuecomment-489219673, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AD2HHPDAO5T5I5KNYK5Y4XDPTSJXNANCNFSM4G33KRCA.

eichmann commented 5 years ago

I have 6138 HTML pages for Erudite cached. Just haven't worked on extracting content as yet from the raw HTML.

eichmann commented 5 years ago

Sorry, make that 6136 pages...

eichmann commented 5 years ago

Oh, yeah. Now I remember - there's a linked data Zenodo release at https://zenodo.org/record/2553478#.XMyfXi-ZN24

referenced off of here: https://bioint.github.io/erudite-training-resource-standard/

eichmann commented 5 years ago

Sample creative work record:

bdu-resource:10003417681137327691 a schema:CreativeWork ; schema:author bdu-person:Xilin_Chen ; schema:genre dseo:artificial_intelligence, dseo:image_data, dseo:video ; schema:name "Hierarchical Hybrid Statistic based Video Binary Code and Its Application to Face Retrieval in TV-Series" ; schema:provider bdu-organization:Chinese_Academy_of_Sciences, bdu-organization:VideoLectures%2Enet ; schema:url "http://videolectures.net/fgconference2015_chen_face_retrieval/" .

williamhersh commented 5 years ago

Thanks Dave, have you harvested the pages linked from these sites?

OHSU BD2K modules: https://github.com/OHSUBD2K

ONC Health IT curriculum: https://www.healthit.gov/topic/health-it-resources/health-it-curriculum-resources-educators

Once we have all of these, what are our next steps for “harmonizing” the metadata? Should we set up a call?

Bill

From: Dave Eichmann notifications@github.com Reply-To: data2health/edu-harmonization reply@reply.github.com Date: Friday, May 3, 2019 at 12:52 PM To: data2health/edu-harmonization edu-harmonization@noreply.github.com Cc: William Hersh hersh@ohsu.edu, Mention mention@noreply.github.com Subject: Re: [data2health/edu-harmonization] Curate initial round of CTSA "gems" as examples to assess metadata/ontology (#2)

Here's the main table. clic.xlsxhttps://github.com/data2health/edu-harmonization/files/3143372/clic.xlsx

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/data2health/edu-harmonization/issues/2#issuecomment-489219673, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AD2HHPDAO5T5I5KNYK5Y4XDPTSJXNANCNFSM4G33KRCA.

eichmann commented 5 years ago

Bill,

Yes on the BD2K modules - they were harvested as part of our initial exploration of relevant GitHub repositories. Content is limited to the repo metadata (including the README), but hits do show up in our search interface on various topics.

No on the ONC material (as yet). The number of artifacts here isn’t huge, but it appears that we would have to hand-craft some form of metadata markup, as its all Word docs, Powerpoints and videos.

Both of these sources raise the interesting question of the utility of mining actual artifacts (e.g., slide decks) for indexing text. The full set of sources (including DIAMOND, N-Lighten, ERuDIte) have varying forms of metadata - target audience, competency, etc. - that may or may not match another site’s usage, nomenclature, etc. That’s where much of the harmonization needs to be done.

On May 6, 2019, at 7:40 AM, William Hersh notifications@github.com wrote:

Thanks Dave, have you harvested the pages linked from these sites?

OHSU BD2K modules: https://github.com/OHSUBD2K

ONC Health IT curriculum: https://www.healthit.gov/topic/health-it-resources/health-it-curriculum-resources-educators

Once we have all of these, what are our next steps for “harmonizing” the metadata? Should we set up a call?

Bill

From: Dave Eichmann notifications@github.com Reply-To: data2health/edu-harmonization reply@reply.github.com Date: Friday, May 3, 2019 at 12:52 PM To: data2health/edu-harmonization edu-harmonization@noreply.github.com Cc: William Hersh hersh@ohsu.edu, Mention mention@noreply.github.com Subject: Re: [data2health/edu-harmonization] Curate initial round of CTSA "gems" as examples to assess metadata/ontology (#2)

Here's the main table. clic.xlsxhttps://github.com/data2health/edu-harmonization/files/3143372/clic.xlsx

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/data2health/edu-harmonization/issues/2#issuecomment-489219673, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AD2HHPDAO5T5I5KNYK5Y4XDPTSJXNANCNFSM4G33KRCA. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/data2health/edu-harmonization/issues/2#issuecomment-489605649, or mute the thread https://github.com/notifications/unsubscribe-auth/ACT5BO54BUGIZFUTTALVIIDPUARKBANCNFSM4G33KRCA.

williamhersh commented 5 years ago

Dave,

The links on the ONC page are to .zip files that have the entire components or the individual units within each one. There is also a link to the teaching guide for each component. Could you not include those without having to drill into the PPTs, Word docs, etc.?

Bill

From: Dave Eichmann notifications@github.com Reply-To: data2health/edu-harmonization reply@reply.github.com Date: Monday, May 6, 2019 at 7:54 AM To: data2health/edu-harmonization edu-harmonization@noreply.github.com Cc: William Hersh hersh@ohsu.edu, Mention mention@noreply.github.com Subject: Re: [data2health/edu-harmonization] Curate initial round of CTSA "gems" as examples to assess metadata/ontology (#2)

Bill,

Yes on the BD2K modules - they were harvested as part of our initial exploration of relevant GitHub repositories. Content is limited to the repo metadata (including the README), but hits do show up in our search interface on various topics.

No on the ONC material (as yet). The number of artifacts here isn’t huge, but it appears that we would have to hand-craft some form of metadata markup, as its all Word docs, Powerpoints and videos.

Both of these sources raise the interesting question of the utility of mining actual artifacts (e.g., slide decks) for indexing text. The full set of sources (including DIAMOND, N-Lighten, ERuDIte) have varying forms of metadata - target audience, competency, etc. - that may or may not match another site’s usage, nomenclature, etc. That’s where much of the harmonization needs to be done.

On May 6, 2019, at 7:40 AM, William Hersh notifications@github.com wrote:

Thanks Dave, have you harvested the pages linked from these sites?

OHSU BD2K modules: https://github.com/OHSUBD2K

ONC Health IT curriculum: https://www.healthit.gov/topic/health-it-resources/health-it-curriculum-resources-educators

Once we have all of these, what are our next steps for “harmonizing” the metadata? Should we set up a call?

Bill

From: Dave Eichmann notifications@github.com Reply-To: data2health/edu-harmonization reply@reply.github.com Date: Friday, May 3, 2019 at 12:52 PM To: data2health/edu-harmonization edu-harmonization@noreply.github.com Cc: William Hersh hersh@ohsu.edu, Mention mention@noreply.github.com Subject: Re: [data2health/edu-harmonization] Curate initial round of CTSA "gems" as examples to assess metadata/ontology (#2)

Here's the main table. clic.xlsxhttps://github.com/data2health/edu-harmonization/files/3143372/clic.xlsx

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/data2health/edu-harmonization/issues/2#issuecomment-489219673, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AD2HHPDAO5T5I5KNYK5Y4XDPTSJXNANCNFSM4G33KRCA. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/data2health/edu-harmonization/issues/2#issuecomment-489605649, or mute the thread https://github.com/notifications/unsubscribe-auth/ACT5BO54BUGIZFUTTALVIIDPUARKBANCNFSM4G33KRCA.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/data2health/edu-harmonization/issues/2#issuecomment-489650977, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AD2HHPBHKEJI635GZ7BBKLDPUBBDBANCNFSM4G33KRCA.

eichmann commented 5 years ago

@williamhersh to clarify a bit - except for the main page, everything's either an overview Word doc (middle column on that page) or a pointer to a Zip file (right column on that page). Having the harvester snag the overview doc, or drill into the zips for detail docs, is easy. But we only have one link for a person to land on when clicking on a hit, unless we point them straight at one of the docs, for example. That would give them no context from which to proceed. Hence I'm assuming for this that we would want to craft a metadata record for each resource and a corresponding transitional landing page for a hit in our space giving the user context and path(s) forward to actual artifacts.

williamhersh commented 5 years ago

Dave, we could put an un-zipped version into Github that could provide context there? (The materials are public domain.)

From: Dave Eichmann notifications@github.com Reply-To: data2health/edu-harmonization reply@reply.github.com Date: Monday, May 6, 2019 at 8:53 AM To: data2health/edu-harmonization edu-harmonization@noreply.github.com Cc: William Hersh hersh@ohsu.edu, Mention mention@noreply.github.com Subject: Re: [data2health/edu-harmonization] Curate initial round of CTSA "gems" as examples to assess metadata/ontology (#2)

@williamhershhttps://github.com/williamhersh to clarify a bit - except for the main page, everything's either an overview Word doc (middle column on that page) or a pointer to a Zip file (right column on that page). Having the harvester snag the overview doc, or drill into the zips for detail docs, is easy. But we only have one link for a person to land on when clicking on a hit, unless we point them straight at one of the docs, for example. That would give them no context from which to proceed. Hence I'm assuming for this that we would want to craft a metadata record for each resource and a corresponding transitional landing page for a hit in our space giving the user context and path(s) forward to actual artifacts.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/data2health/edu-harmonization/issues/2#issuecomment-489672255, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AD2HHPAGNYQEJH7CY2NKYX3PUBH7VANCNFSM4G33KRCA.

nicolevasilevsky commented 5 years ago

@eichmann, @williamhersh and @wondermixtape - is my action item to add the resources listed in the clic.xlsx file to N-lighten?

wondermixtape commented 5 years ago

Curation of exemplar resources with ontology and metadata elements 1st draft completed