Get the article URL when extracting data

PubPeerFoundation / PublicationDataExtractor

Extract Research Publication data from any external API into a simple common structure

MIT License

7 stars 1 forks source link

Get the article URL when extracting data #16

Open brandonStell opened 5 years ago

brandonStell commented 5 years ago

We could do it based on DOIs using something like this:

<?php
$doi = $argv[1];
$url = 'http://dx.doi.org/'.$doi;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$a = curl_exec($ch);
if(preg_match('#Location: (.*)#', $a, $r))
 $l = trim($r[1]);
return $l;

For things that don't have a DOI (only arXiv?) we can make the URL from the ID:

$arxivID = $argv[1];
$url 'https://arxiv.org/abs/'.$arxivID
return $url

brandonStell commented 5 years ago

by the way the code above was taken from here: http://zzz.rezo.net/HowTo-Expand-Short-URLs.html

XavRsl commented 5 years ago

We have this feature already. It's just not available from the API but could be very easily.

Le mar. 30 avr. 2019 à 18:54, Brandon Stell notifications@github.com a écrit :

by the way the code above was taken from here: http://zzz.rezo.net/HowTo-Expand-Short-URLs.html

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/PubPeerFoundation/PublicationDataExtractor/issues/16#issuecomment-488030069, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJBQMH4ZR4FNULMHXIQ6V3PTB2TFANCNFSM4HJOMB6Q .

brandonStell commented 5 years ago

That would be great. However, I think the problem is more complicated that I originally thought. For example this DOI: 10.1016/j.cell.2019.02.019 Should resolve to this URL: https://www.cell.com/cell/fulltext/S0092-8674(19)30168-0 Like it does here: https://doi.org/10.1016/j.cell.2019.02.019

cURL in my script above does not return the correct link...

I can get the correct link only when I use the selenium package in python (presumably because it emulates a real browser).

brandonStell commented 5 years ago

(also note that the URL returned by the crossref API is not correct)

brandonStell commented 5 years ago

I guess we'll probably need an array of links for each DOI since there seems to be several.