EDIorg / EMLassemblyline

R package for creating EML metadata
https://ediorg.github.io/EMLassemblyline/
MIT License
28 stars 13 forks source link

Clash with accents #73

Closed earnaud closed 3 years ago

earnaud commented 4 years ago

Upon templating CatVars on a DP, I ran across the problems of accents in international use of EAL. R reads a string containing accent until reaching the accented character and returns the beginning of the string. For example, "Bouée 1" and "Bouée 2" both become "Bou", introducing duplicated catvar codes.

Using R base iconv() might help resolve this upon reading files.

earnaud commented 4 years ago

This also leads to problems with make_eml() :

invalid UTF-8 input in readChar()
clnsmth commented 4 years ago

Thanks for reporting this issue @earnaud. Please email me the data and metadata template so I can test further.

clnsmth commented 4 years ago

A fix is underway at branch fix_73.

earnaud commented 3 years ago

I once again came up with a file encoded with accents upon CatVars step. Has this commit been pushed?

EAL v. 2.18.1

EDIT

Silly me, posting the EAL version without noticing this is not even the last one released.

clnsmth commented 3 years ago

It's currently in the development branch, with the new messaging feature that needs a little more work. Is it possible to reference development until released into master?

earnaud commented 3 years ago

Upon using template_annotations() I get the following error:

Error in enc2utf8(data_table) : argument is not a character vector

It seems you try to use enc2utf8 over a data frame (or so). One thing could be to translate data_table as an array() (which in fact is a multidimensional vector) and then apply enc2utf8.

clnsmth commented 3 years ago

@earnaud I don't see this line of code in the development branch version of template_annotations() (EAL 2.19.0). What version is throwing this error?

earnaud commented 3 years ago

Here is where I found it in earnaud/EMLassemblyline@development :

$ grep --line-number enc2utf8.data R/*
R/template_arguments.R:147:    names(data_tables) <- enc2utf8(data_table)
clnsmth commented 3 years ago

Found it! data_table in the context of enc2utf8(data_table) is the table name as a character string. If you send the table and table attributes template I will look at the issue in more detail.

clnsmth commented 3 years ago

@earnaud I was unable to reproduce this error with the files you sent, but I have added a precautionary step of converting inputs to character strings (e.g. enc2utf8(as.character(data_table))).

earnaud commented 3 years ago

Ok, I do not get it fully then. Closing it for now.