Open petermr opened 5 years ago
A sample of 100 entries https://github.com/gilienv/EssOilDB/blob/master/tables/bibliography/sample.tsv looks like:
title author DOI_link DOI vol JOURNAL profile_c
(Z)-ë_-Ocimene from... Joseph J. Brophy ... https://doi.org/10.1080/10412905.1998.9700889 10.1080/10412905.1998.9700889 VOL. 10, 229-233 (Mar/Apr 1998) Journal of Essential Oil Research JEHflau1998Lea#JEHmoau1998Lea
1, 8-Cineole-Caryophyllene ... Danute Mockute, ... https://doi.org/10.1080/10412905.2004.9698708 10.1080/10412905.2004.9698708 VOL. 16, 236-238 (May/June 2004) Journal of Essential Oil Research JETseli2004Aer
1,10-beta-Epoxy-6-oxofura ... royleanus DC. ... https://doi.org/10.1080/10412905.2011.9700434 10.1080/10412905.2011.9700434 VOL. 23, 102-104 (Jan/Feb 2011) Journal of Essential Oil Research JESrokeutin2011Lea#JESrokeutin2011Ste#JESrokeutin2011Flo#JESrokeutin2011Aer
Looks useful.
ACTION Need a unique ID for each row. Format EBib0001234
ACTION remove columns In production table (not this one) the DOI link, and profile_c will be redundant. The title, authors and journal will be retrieved from Crossref or other authority.
It will be useful to resolve the DOIs in EuropePMC to see how many of these are OpenAccess.
Have used the EPMC API to retrieve metadata for each of the bibliographic entries (1402). see: https://github.com/gilienv/EssOilDB/tree/master/tables/bibliography/epmc
The script:
https://github.com/gilienv/EssOilDB/tree/master/tables/bibliography/epmc/epmcopen.sh
uses curl
to retrieve metadata.
#! /bin/sh
sleep 1
curl -o 10.1002_ffj.1019.xml -k https://www.ebi.ac.uk/europepmc/webservices/rest/search?query=10.1002/ffj.1019&format=xml
sleep 1
curl -k -o 10.1002_ffj.1047.xml https://www.ebi.ac.uk/europepmc/webservices/rest/search?query=10.1002/ffj.1047&format=xml
sleep 1
curl -k -o 10.1002_ffj.1048.xml https://www.ebi.ac.uk/europepmc/webservices/rest/search?query=10.1002/ffj.1048&format=xml
...
For each paper there is a metadata file *.xml which can be interrogated for the phrase:
<isOpenAccess>Y</isOpenAccess>
In V1.0 there are very few OA articles and we'll download them. But in the wider world there are lots that can go into V2.0
Sir, Please go through the bibliography table with uniqueID and removed columns - DOI_link and profile_c.
This is not a bibliography, it is a list of titles. It's not useful. Where are the DOIs? and at this stage we should retain the rest of the fields in this table - journal, authors, pages, year,
On Mon, Aug 5, 2019 at 9:32 AM Ambarish Kumar notifications@github.com wrote:
Sir, Please go through the (bibliography)[ https://github.com/gilienv/EssOilDB/blob/master/tables/bibliography/bibliography050819.csv] table with uniqueID and removed columns - DOI_link and profile_c.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gilienv/EssOilDB/issues/90?email_source=notifications&email_token=AAFTCS3E6YVG7VJMFRMDSF3QC7QS3A5CNFSM4II47EL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3RC6HQ#issuecomment-518139678, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCSZNNGVJO7YXMPLYBWLQC7QS3ANCNFSM4II47ELQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
Sir, There are following columns into the bibliography table.
Thank you, This looks fine. Have you checked for duplicates? And are all the charcters Unicode/UTF-8
Sir, Character encoding is not as of Unicode/UTF-8.
e.g -
On Wed, Aug 7, 2019 at 8:06 AM Ambarish Kumar notifications@github.com wrote:
Sir, Character encoding is not as of Unicode/UTF-8.
e.g -
- Analysis of Essential Oils from Wild and Domesticated Plants of Glechoma sardoa Bég.
- (Z)-β-Ocimene from Two Species of Homoranthus (Myrtaceae).
I get EBib00050,Analysis of Essential Oils from Wild and Domesticated Plants of Glechoma sardoa Bég EBib0001,(Z)-β-Ocimene
when displayed in Textmate
This may be a problem of Excel and not the file itself. By default Excel does not use UTF-8 - you have to find how to import, e.g. https://www.nextofwindows.com/how-to-display-csv-files-with-unicode-utf-8-encoding-in-excel .
My first real error is EBib00078,Antiaflatoxigenic and antioxidant activity of an essential oil from Ageratum conyzoides L.,"Rajaram P Patil, Mansingraj S Nimbalkar, Umesh U Jadhav, Vishal V Dawkarc and Sanjay P Govindwarc",�10.1002/jsfa.3857,"VOL.90,608–614(2010)",Journal of Sci Food Agri
However the way to solve this is probably to import the titles and authors from Crossref or to handedit. DO NOT USE WINDOWS SOFTWARE (Word, Notepad, Excel) as it universally uses uncommon encodings.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gilienv/EssOilDB/issues/90?email_source=notifications&email_token=AAFTCS5TIGX2NFZPYLRWRJDQDJX7HA5CNFSM4II47EL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3XNR2A#issuecomment-518969576, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCSYGY2B4BTUXBVUVGBLQDJX7HANCNFSM4II47ELQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
checking that there are 1752 entries. Are there any ambiguities?
On Wed, Aug 7, 2019 at 1:37 PM Peter Murray-Rust < peter.murray.rust@googlemail.com> wrote:
On Wed, Aug 7, 2019 at 8:06 AM Ambarish Kumar notifications@github.com wrote:
Sir, Character encoding is not as of Unicode/UTF-8.
e.g -
- Analysis of Essential Oils from Wild and Domesticated Plants of Glechoma sardoa Bég.
- (Z)-β-Ocimene from Two Species of Homoranthus (Myrtaceae).
I get EBib00050,Analysis of Essential Oils from Wild and Domesticated Plants of Glechoma sardoa Bég EBib0001,(Z)-β-Ocimene
when displayed in Textmate
This may be a problem of Excel and not the file itself. By default Excel does not use UTF-8 - you have to find how to import, e.g. https://www.nextofwindows.com/how-to-display-csv-files-with-unicode-utf-8-encoding-in-excel .
My first real error is EBib00078,Antiaflatoxigenic and antioxidant activity of an essential oil from Ageratum conyzoides L.,"Rajaram P Patil, Mansingraj S Nimbalkar, Umesh U Jadhav, Vishal V Dawkarc and Sanjay P Govindwarc",�10.1002/jsfa.3857,"VOL.90,608–614(2010)",Journal of Sci Food Agri
However the way to solve this is probably to import the titles and authors from Crossref or to handedit. DO NOT USE WINDOWS SOFTWARE (Word, Notepad, Excel) as it universally uses uncommon encodings.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gilienv/EssOilDB/issues/90?email_source=notifications&email_token=AAFTCS5TIGX2NFZPYLRWRJDQDJX7HA5CNFSM4II47EL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3XNR2A#issuecomment-518969576, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCSYGY2B4BTUXBVUVGBLQDJX7HANCNFSM4II47ELQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
No sir. There is no ambiguity related to DOI
mapping to title, author and journal
using Crossref API
.
Yes, there are 1752 entries.
Thank you, I agree that bibliography is now close to finalized. The top priority is now to create a profile table and link to the others.
Yes sir.
Sir,
Only two records are there which has diamond mark at the beginning of their DOI. It can be hand-edited.
EBib00078 | Antiaflatoxigenic and antioxidant activity of an essential oil from Ageratum conyzoides L. | Rajaram P Patil, Mansingraj S Nimbalkar, Umesh U Jadhav, Vishal V Dawkarc and Sanjay P Govindwarc | �10.1002/jsfa.3857 | VOL.90,608–614(2010) | Journal of Sci Food Agri
EBib000288 | Chemical Composition of Artemisia absinthium L. from Greece | A. Basta, O. Tzakou, M. Couladis & M. Pavlović | �10.1080/10412905.2007.9699291 | VOL. 19, 316-318 (July/Aug 2007) | Journal of Essential Oil Research
Thank you, do you know what the problematic characters are? are they printing/nonprinting? My guess us that they will be spaces or punctuation. (They may occur in other files)
Which table did you create bibliography from?? That can be a basis for the Profile table. Is it on Github?
On Thu, Aug 8, 2019 at 12:42 PM Ambarish Kumar notifications@github.com wrote:
Sir,
Only two records are there which has diamond mark at the beginning of their DOI. It can be hand-edited.
EBib00078 | Antiaflatoxigenic and antioxidant activity of an essential oil from Ageratum conyzoides L. | Rajaram P Patil, Mansingraj S Nimbalkar, Umesh U Jadhav, Vishal V Dawkarc and Sanjay P Govindwarc | �10.1002/jsfa.3857 | VOL.90,608–614(2010) | Journal of Sci Food Agri
EBib000288 | Chemical Composition of Artemisia absinthium L. from Greece | A. Basta, O. Tzakou, M. Couladis & M. Pavlović | �10.1080/10412905.2007.9699291 | VOL. 19, 316-318 (July/Aug 2007) | Journal of Essential Oil Research
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gilienv/EssOilDB/issues/90?email_source=notifications&email_token=AAFTCS3NCNBEWANAVENWH3LQDQBA3A5CNFSM4II47EL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD33LAHY#issuecomment-519483423, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCSYQV67AOMCHPLGAEFTQDQBA3ANCNFSM4II47ELQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
Sir,
These are extra introduced to DOI. As I go for finding these title over web, searched article does not has any such character which is appearing problematic here.
e.g. -
J Sci Food Agric. 2010 Mar 15;90(4):608-14. doi: 10.1002/jsfa.3857.
Antiaflatoxigenic and antioxidant activity of an essential oil from Ageratum conyzoides L.
Patil RP1, Nimbalkar MS, Jadhav UU, Dawkar VV, Govindwar SP
Chemical Composition of Artemisia absinthium L. from Greece
A. Basta , O. Tzakou , M. Couladis & M. Pavlović
Pages 316-318 | Received 01 Oct 2005, Accepted 01 Feb 2006, Published online: 28 Nov 2011
Download citation https://doi.org/10.1080/10412905.2007.9699291
Bibliography information is extracted from plant info table.
Bibliography table with unique records.
Records are made unique based on title value.
Thank you, This will need linking into Profile table at some stage.
On Mon, Aug 12, 2019 at 12:16 PM Ambarish Kumar notifications@github.com wrote:
[Bibliography( https://github.com/gilienv/EssOilDB/blob/master/tables/bibliography/bibliographyFinal050819uniqueTitle.csv) table with unique records.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gilienv/EssOilDB/issues/90?email_source=notifications&email_token=AAFTCS7YF7EBHFQYKBOWXHLQEFBALA5CNFSM4II47EL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4CHBQA#issuecomment-520384704, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS5PXJYAPTCFZ2DD73LQEFBALANCNFSM4II47ELQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
@ambarishK has created and upload a table of the bibliography from V1.0. I have moved this to
and I have exported as
NOTE: There are encoding problems and some of the titles and authors are corrupted. However I suggest that if we can recover the DOI, then we can recover the title from Crossref if we need it and that we should accept Crossref's title/authors.