arademaker / SLattes

Semantic Lattes
Other
40 stars 15 forks source link

DOI and home page's URL #4

Open feromes opened 2 years ago

feromes commented 2 years ago

We are engaged at CEM/FFLCH (Centro de Estudo da Metrópole) in promoting automatization in researchers' CVs. This work is remarkable to do that, but we note that DOI and URL of some works are not being imported properly.

We're going to suggest an update to fix it as well.

Thanks a lot.

arademaker commented 2 years ago

Thank you for the PR, but:

  1. You left data from your researchers in the commit. It doesn't make sense to add data from particular resumes in this repo.

  2. I prefer to keep the readme as Org Mode and not to move to Markdown.

  3. It seems that XML specification of the Lattes files changed, is that right? Can you confirm that? Can you point me to the current DTD or XML T

feromes commented 2 years ago

Hello,

  1. Sorry, it's a mistake, because just the first commit make sense, other ones is just for our use to validate the results and other plannings we're considering.
  2. Sure, again another issue related to my mistake of PR all commits
  3. I am not sure, but I am considering it. Tomorrow I'll get back to this job and will investigate it

I'll be back here proposing a new PR, just with the changes on lattes2mods.xsl file

Sorry again about this mess I've done.

feromes commented 2 years ago

Sorry again,

Now I've done another PR #6 updating, just the lattes2mods.xsl

Answering the question:

It seems that XML specification of the Lattes files changed, is that right? Can you confirm that? Can you point me to the current DTD or XML T

I was not able to confirm if the specification has changed, but I am linking the actual version here https://memoria.cnpq.br/c/document_library/get_file?uuid=772309c0-fb72-4c6a-8c88-64b0ba46ae5d&groupId=313759

Thanks,

arademaker commented 2 years ago

OK, I found at https://memoria.cnpq.br/web/portal-lattes/extracoes-de-dados that CNPq now makes the definition available as an XML Schema. I downloaded my CV and tested:

With the current DTD in this repo:

% xmllint --dtdvalid LMPLCurriculo.DTD --noout ~/Downloads/curriculo.xml
/Users/ar/Downloads/curriculo.xml:1: element DADOS-GERAIS: validity error : No declaration for attribute ORCID-ID of element DADOS-GERAIS
/Users/ar/Downloads/curriculo.xml:1: element DETALHAMENTO-DA-PATENTE: validity error : Element DETALHAMENTO-DA-PATENTE content does not follow the DTD, expecting (REGISTRO-OU-PATENTE)?, got (REGISTRO-OU-PATENTE HISTORICO-SITUACOES-PATENTE)
/Users/ar/Downloads/curriculo.xml:1: element REGISTRO-OU-PATENTE: validity error : No declaration for attribute NOME-DO-DEPOSITANTE of element REGISTRO-OU-PATENTE
/Users/ar/Downloads/curriculo.xml:1: element HISTORICO-SITUACOES-PATENTE: validity error : No declaration for element HISTORICO-SITUACOES-PATENTE
/Users/ar/Downloads/curriculo.xml:1: element HISTORICO-SITUACOES-PATENTE: validity error : No declaration for attribute DESCRICAO-SITUACAO-PATENTE of element HISTORICO-SITUACOES-PATENTE
/Users/ar/Downloads/curriculo.xml:1: element HISTORICO-SITUACOES-PATENTE: validity error : No declaration for attribute DATA-SITUACAO-PATENTE of element HISTORICO-SITUACOES-PATENTE
/Users/ar/Downloads/curriculo.xml:1: element HISTORICO-SITUACOES-PATENTE: validity error : No declaration for attribute STATUS-SITUACAO-PATENTE of element HISTORICO-SITUACOES-PATENTE
/Users/ar/Downloads/curriculo.xml:1: element DETALHAMENTO-DA-PATENTE: validity error : Element DETALHAMENTO-DA-PATENTE content does not follow the DTD, expecting (REGISTRO-OU-PATENTE)?, got (REGISTRO-OU-PATENTE HISTORICO-SITUACOES-PATENTE)
/Users/ar/Downloads/curriculo.xml:1: element REGISTRO-OU-PATENTE: validity error : No declaration for attribute NOME-DO-DEPOSITANTE of element REGISTRO-OU-PATENTE
/Users/ar/Downloads/curriculo.xml:1: element HISTORICO-SITUACOES-PATENTE: validity error : No declaration for element HISTORICO-SITUACOES-PATENTE
/Users/ar/Downloads/curriculo.xml:1: element HISTORICO-SITUACOES-PATENTE: validity error : No declaration for attribute DESCRICAO-SITUACAO-PATENTE of element HISTORICO-SITUACOES-PATENTE
/Users/ar/Downloads/curriculo.xml:1: element HISTORICO-SITUACOES-PATENTE: validity error : No declaration for attribute DATA-SITUACAO-PATENTE of element HISTORICO-SITUACOES-PATENTE
/Users/ar/Downloads/curriculo.xml:1: element HISTORICO-SITUACOES-PATENTE: validity error : No declaration for attribute STATUS-SITUACAO-PATENTE of element HISTORICO-SITUACOES-PATENTE
/Users/ar/Downloads/curriculo.xml:1: element DADOS-BASICOS-DE-ORIENTACOES-CONCLUIDAS-PARA-MESTRADO: validity error : Value "NAO_INFORMADO" for attribute TIPO of DADOS-BASICOS-DE-ORIENTACOES-CONCLUIDAS-PARA-MESTRADO is not among the enumerated set
Document /Users/ar/Downloads/curriculo.xml does not validate against LMPLCurriculo.DTD

Using the new XSD downloaded from the above link:

% xmllint --schema CurriculoLattes.xsd --noout ~/Downloads/curriculo.xml
/Users/ar/Downloads/curriculo.xml validates

But note that none of the changes were addressed by your PR. I will make specific comments in the #6

arademaker commented 2 years ago

@feromes você pode colocar aqui um link para o CV que vc identificou que, quando processado pela transformação lattes2mods.xsl não tem a URL e DOI capturados?

arademaker commented 2 years ago

@feromes vc também poderia confirmar se o CV que vc está tentando aplicar a transformação passa na validação usando o novo XSD? Veja como fazer a validação no README, acabei de atualizar as instruções. Difícil termos respostas do CNPq, mas parece que os XML agora seguem esta especificação XSD. O site http://lmpl.cnpq.br/lmpl/, que antes eu usava como referencia, parece abandonado.