bireme / data-governance

This project aims to develop the levels of data governance for Virtual Health Library databases, including ETL, data quality and data visualization processes.
2 stars 1 forks source link

Incluir uma coluna ou CSV separado com o campo DOI, gerado durante o fluxo do Tableau Prep #201

Closed renatomurasaki closed 4 years ago

renatomurasaki commented 4 years ago

@dossanluc por favor, verifique com o @falbrito a melhor maneira de incluir o campo DOI na exportação do CSV, de maneira que possamos incluí-lo no processamento para que esteja no XML do Lucene.

Este campo já está presente no fluxo do Tableau Prep, só não é escolhido no passo final de geração do CSV geral.

Esse campo é importante para a funcionalidade de exportação do iAHx nos formatos RIS e CSV

dossanluc commented 4 years ago

Um novo arquivo (who-covid-19-repo-export2iAHxProcessingOnlyDOI.csv) foi incluído no processo de extração com chave e valor (CovNum e DOI).

falbrito commented 4 years ago

Incluído no processameno o arquivo who-covid-19-repo-export2iAHxProcessingOnlyDOI.csv. O registro de DOI foi colocado na tag v724 do master ISIS conforme exemplo.

operacao@serverofi5:/bases/lilG4/cvd.lil $ mx lilacs from=4
mfn=     4 
  2  "65150"
  5  "S"
  6  "as"
  8  "^uhttps://doi.org/10.1038/d41586-020-01108-y"
 10  "Cyranoski, Andrew Silver"
 10  "David"
 12  "China is tightening its grip on coronavirus research"
 30  "Nature"
 64  "2020"
 65  "20200000"
 83  "Some scientists welcome government vetting because it could stop poor-quality COVID-19 papers being published – others fear it is an attempt to control information"
 91  "20200416"
724  "10.1038/d41586-020-01108-y"   <<====== DOI
776  "WHO^i65150"
854  "202000"
855  "0004"
1001  "65150"
1002  "2020"
1008  "16/04/2020"
..

Foi criado o elemento "doi" no XML para indexação

  <doc boost="45">
    <field name="id">covidwho-65150</field>
    <field name="bvs">covidwho</field>
    <field name="db">COVIDWHO</field>
    <field name="instance">covidwho</field>
    <field name="collection">04-international_org</field>
    <field name="type">article</field>
    <field name="nivel_tratamento">as</field>
    <field name="ur">https://doi.org/10.1038/d41586-020-01108-y</field>
    <field name="au">Cyranoski, Andrew Silver</field>
    <field name="au">David</field>
    <field name="ti">China is tightening its grip on coronavirus research</field>
    <field name="fo">Nature;2020.</field>
    <field name="ta">Nature</field>
    <field name="dp">2020</field>
    <field name="da">202000</field>
    <field name="ab">Some scientists welcome government vetting because it could stop poor-quality COVID-19 papers being published – others fear it is an attempt to control information</field>
    <field name="no_indexing">1</field>
    <field name="fulltext">1</field>
    <field name="services">SCAD</field>
    <field name="weight">45</field>
    <field name="entry_date">20200416</field>
    <field name="dp_ym">202000</field>
    <field name="nro_month">0004</field>
    <field name="regional_office">others</field>
    <field name="id_pk">65150</field>
    <field name="doi">10.1038/d41586-020-01108-y</field>   <<====== DOI
  </doc>