AI-ON / biomedical-retractions

7 stars 3 forks source link

Feature Extraction / Additional Data #2

Open wgmueller1 opened 7 years ago

wgmueller1 commented 7 years ago

We need to identify features of interest and extract them from each article. This may necissitate bringing in additional data. For example, Impact Factor may be useful. I have access to Web of Science and can download the impact factors for each year, but need to identify the date of publication for each article.

What other information / features are people interested in?

souravsingh commented 7 years ago

We could have Subject of the research, the domain in which the research was conducted.

wgmueller1 commented 7 years ago

@souravsingh do you have any ideas for this? we would probably need some additional work for this. for the pubmed dataset, we have journal name, but don't have categories or keywords (that I know of)/

souravsingh commented 7 years ago

I think the PubMed contains a tag called MeSH Major Topic, we could use that.

wgmueller1 commented 7 years ago

I'll work on a feature extractor and include the following (add to list if you have other ideas).

  1. Journal Id
  2. Journal Impact Factor
  3. Major Topic
  4. Author Id
  5. Author Institutions
  6. Abstract
  7. Full text