SciCrunch / scibot

curation workflow automation and coordination
Apache License 2.0
41 stars 12 forks source link

Scibot can't pull pmid/doi on bio-protocol.org #26

Open anp055 opened 6 years ago

anp055 commented 6 years ago

Scibot doesn't seem to pull page note on bio-protocol.org

https://bio-protocol.org/e2400#biaoti15399

Side note, Scibot can't seem to pull PMID if run directly on pubmed? I figure we won't need to run scibot on pubmed very often, but it's strange that it can't pull the info on the very same page.

https://www.ncbi.nlm.nih.gov/pubmed/28139828

tgbugs commented 6 years ago

In theory I can pull the doi out from bio-protocol using this pattern

<div class="float-left doclink" style="margin-right: 100px;">
     <span>DOI:</span>&nbsp;&nbsp;
     <a href="https://doi.org/10.21769/BioProtoc.2400">10.21769/BioProtoc.2400</a>
</div>

For pubmed I could use this to crawl, but it is sort of pointless, because you are already crawling the pubmed id. I could add a page note, but I actually don't think that crawling pubmed pages is the right thing to do at all. @bandrow thoughts?

<div class="resc">
<dl class="rprtid">
 <dt>PMID:</dt> <dd>28139828</dd> 
 <dt>DOI:</dt> <dd><a href="//doi.org/10.1002/cne.24179" ref="aid_type=doi" target="_blank">10.1002/cne.24179</a></dd> 
</dl>
</div>
bandrow commented 6 years ago

Crawling PubMed is not a good thing to do. It lives in our systems, so you just have to query our systems if you don't want to query PubMed itself.

On Fri, Sep 14, 2018 at 1:32 PM Tom Gillespie notifications@github.com wrote:

In theory I can pull the doi out from bio-protocol using this pattern

For pubmed I could use this to crawl, but it is sort of pointless, because you are already crawling the pubmed id. I could add a page note, but I actually don't think that crawling pubmed pages is the right thing to do at all. @bandrow https://github.com/bandrow thoughts?

PMID:
28139828
DOI:
10.1002/cne.24179

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SciCrunch/scibot/issues/26#issuecomment-421476489, or mute the thread https://github.com/notifications/unsubscribe-auth/AFsrNcUnbd_aTTr7-1BXXIQT43540MJ7ks5ubBJvgaJpZM4Wp4gn .

-- All key biological entities deserve an #RRID! orcid.org/0000-0002-5497-0243