Closed KnowledgeGarden closed 9 years ago
Hi, which EtherPad are you referring to? Happy to consider extending the scrapers if a significant proportion of journals contain the information you're interested in. In my latest iteration, I have included keywords.
https://etherpad.mozilla.org/sciencelab-2014summersprint-mining-literature My work with PubMed abstracts suggests keywords are there, as are Mesh names (these are essentially hand-tagged documents by domain experts), and chemical substance names. All of those play key roles when teasing meaning out of documents. My work entails crafting topic maps from text documents; all the hints available are valuable.
Ah, OK. We use etherpads at a lot of events and generally don't use them after the event except for historical archiving.
If PubMed has keywords, Mesh names and chemical substances, I'm happy to scrape them. If you'd like to contribute them to scrapers that would be welcome, and I will also include them in future scrapers. However, I would wait until after this weekend before making any pull requests, as I will push updates to the whole set of scrapers.
As soon as I get past a hard disk crash here, I'll generate a gist which shows what the XML looks like for PubMed keywords, Mesh names and substance names. As to contributing a scraper, that idea is on my mind; at present, I was building OpenSherlock without even knowing about this project, so there is an intention to modify my code to generate compatible scrapes. My question has been answered!
I mentioned this at the EtherPad. In writing scrapers for my OpenSherlock project, I include fields for Mesh names, substance names, and keywords as additional metadata. I am considering rewriting my scrapers to be instances of these open standards. What are the chances of extending journal scrapers?