EDIorg / EMLassemblyline

R package for creating EML metadata
https://ediorg.github.io/EMLassemblyline/
MIT License
28 stars 13 forks source link

consider handling a visible field separator for metadata_template tables. #64

Open mobb opened 4 years ago

mobb commented 4 years ago

Most file readers let you specify the field separator on the command line. e.g.,

read_file('myfile.txt', delimiter='\t')

Its fine to have a default (\t). But it would be wonderful to be able to edit a simple metadata template with a text editor, and actually see the delimiter.

You would have to limit the options. Often metadata fields have commas in them, so for safety, it's best if comma is not allowed initially. The delimiter we often use when exporting text from a RDB is a vertical bar or pipe "|". It hardly ever shows up in typical text.

read_file('my_catvars.txt', delimiter='|')

would be able to read this file:

attributeName|code|definition
site|and|Andrews Forest
site|arc|Arctic LTER
site|bes|Baltimore Ecosystem Study
...

and it would be easy to edit in a plain text editor, because the delimiter is visible.

twhiteaker commented 4 years ago

While there isn't a single CSV standard, there are some good ones that handle commas. Basically, you put quotes around text fields, and store quotes within text fields as "".

W3C recommendation

s.o. answer with nice example, (set up using pipes!)

mobb commented 4 years ago

We are not talking about "CSVs" here, the suggestion is to allow reading a text file with a defined delimiter. By starting with a delimiter that is easy to keep out of fields ('|') you avoid having to hide the delimiter when it appears in a field, and then trusting that the file-reader-code can handle the mechanism you chose to use for that. So it is a simple first test.

CSVs are common, so these are logical to allow at some point, if the delimeter-within-field is hidden. the most commonly used methods are escaping within the field ( \,) and quoting the field (text1,"text2,etc",text3). there might be others.

mobb commented 4 years ago

BTW - I have successfully exported EAL metadata_template files from a RDB (postgres), with tab separators. those queries/views are not well documented yet.