cwrc / ontology

CWRC ontology - primary repository
13 stars 7 forks source link

Job tag scrape #312

Closed SusanBrown closed 6 years ago

SusanBrown commented 6 years ago

Need a scrape of all strings within the JOB tag.

@GuelphOntologyTeam and @joelacummings, if you know of a good vocabulary of occupations, ideally one that would have some historical depth, please advise.

DeborahStacey commented 6 years ago

Will have a look for a suitable vocabulary or ontology.

Sent from my iPhone

On Apr 13, 2018, at 4:37 PM, Susan Brown notifications@github.com<mailto:notifications@github.com> wrote:

Need a scrape of all strings within the JOB tag.

@GuelphOntologyTeamhttps://github.com/GuelphOntologyTeam and @joelacummingshttps://github.com/joelacummings, if know of a good vocabulary of occupations, ideally one that would have some historical depth, please advise.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cwrc/ontology/issues/312, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEWiXh1OJ8JK1sTRJ9yfVchrBvLulisyks5toQyVgaJpZM4TURUc.

DeborahStacey commented 6 years ago

What about this?

http://www.ilo.org/public/english/bureau/stat/isco/index.htm

This is a standard for describing occupations - might be a bit big but maybe a subset?

Deb

Sent from my iPhone

On Apr 13, 2018, at 4:37 PM, Susan Brown notifications@github.com<mailto:notifications@github.com> wrote:

Need a scrape of all strings within the JOB tag.

@GuelphOntologyTeamhttps://github.com/GuelphOntologyTeam and @joelacummingshttps://github.com/joelacummings, if know of a good vocabulary of occupations, ideally one that would have some historical depth, please advise.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cwrc/ontology/issues/312, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEWiXh1OJ8JK1sTRJ9yfVchrBvLulisyks5toQyVgaJpZM4TURUc.

DeborahStacey commented 6 years ago

Here is a spreadsheet of all and tags.

Column 1 shows all job terms that only appear in .

Column 2 shows all job terms that only appear in

Column 3 shows all job terms that appear in both and

Enjoy!


From: Susan Brown notifications@github.com Sent: Friday, April 13, 2018 4:37:41 PM To: cwrc/ontology Cc: Deborah A Stacey; Mention Subject: [cwrc/ontology] Job tag scrape (#312)

Need a scrape of all strings within the JOB tag.

@GuelphOntologyTeamhttps://github.com/GuelphOntologyTeam and @joelacummingshttps://github.com/joelacummings, if know of a good vocabulary of occupations, ideally one that would have some historical depth, please advise.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cwrc/ontology/issues/312, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEWiXh1OJ8JK1sTRJ9yfVchrBvLulisyks5toQyVgaJpZM4TURUc.

joelacummings commented 6 years ago

Ah I just saw you did this Deb, I also did a scrape of the job tags here with REG/value merged with occurrence count. https://docs.google.com/spreadsheets/d/1Z3ObkAurITrobHE1bVMhA-mOZP3EIO6m_rD4aXliVP4/edit?usp=sharing Again I can change the format if needed.

SusanBrown commented 6 years ago

Thanks, both!

That’s a lot of job names, Joel!

Deb, I can’t see your file, but a breakdown of in those three ways would be great. Even better would be 2 sheets as follows:

1) REG values, with all strings that occur within the elements that have those reg values put in separate columns in the same row as the REG with which they are associated. Those will then be variant terms we will use for extraction when they occur elsewhere without regs.

2) Strings of element that do not have REG values. Remove any variant terms associated with REG terms, even when they don’t have a REG value in the case of that particular element.

Having these combined with counts on the non-reg strings would be very helpful.

On Apr 14, 2018, at 10:58 PM, Joel Cummings notifications@github.com<mailto:notifications@github.com> wrote:

Ah I just saw you did this Deb, I also did a scrape of the job tags here with REG/value merged with occurrence count. https://docs.google.com/spreadsheets/d/1Z3ObkAurITrobHE1bVMhA-mOZP3EIO6m_rD4aXliVP4/edit?usp=sharing Again I can change the format if needed.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/cwrc/ontology/issues/312#issuecomment-381376430, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAhUoDrAnnVK2nDEKZUHD04-LLnwbiU8ks5tordWgaJpZM4TURUc.

joelacummings commented 6 years ago

Ok I'm on it!

joelacummings commented 6 years ago

Updated.