bcgov / MFIN-Data-Catalogue

The Finance Data Catalogue enables users to discover data holdings at the BC Ministry of Finance and offers information and functionality that benefits consumers of data for business purposes. The product is built using Drupal and adheres to the Government of BC's Core Metadata Standard.
Other
6 stars 0 forks source link

Keep newlines when importing data columns #407

Closed ChristaBull closed 2 months ago

ChristaBull commented 3 months ago

OP timer

https://openplus.monday.com/boards/4092908516/pulses/6379005677


Describe the bug

If a CSV contains newline characters it is replaced with a space when the file is imported.

To reproduce

Steps to reproduce the behaviour:

  1. Go to Catalogue 'Dashboard'
  2. Click on 'Build' for a metadata record
  3. Scroll down to 'Section 6: Data dictionary'
  4. Click on 'Edit'
  5. Click on 'Import/export data columns'
  6. Import CSV with contents below and click 'Upload'
column_allowed_values,column_data_quality,column_description,column_name,column_size,column_transformations,column_type,instance_of,metadata_type,provenance_field_name,provenance_field_number,provenance_field_question,provenance_form_name,provenance_form_number
,"Some content on the first line.

More content on a 3rd line.",A unique system identifier for each Transparency Report submitted or updated by a reporting body or the Land Title and Survey Authority (LTSA),report_key,,,bigint,,,,,,,
  1. Click on 'Import'
  2. Scroll down to 'Section 6: Data dictionary'
  3. Click on 'Edit'
  4. Click 'Edit' next to the 'report_key' column
  5. Scroll down to 'Column data quality' field
  6. See error

Expected behaviour

Column data quality column value should contain newline characters in CSV file.

Screenshots

What was uploaded image

After upload image

CraigClark commented 3 months ago

This isn't exactly a bug. The text areas use HTML for the input. This allows for some limited formatting. So if you import html, you output html.

This is from a cell in an excel spreadsheet, column column_allowed_values:

<p>this is a line.</p><p>This is a new line.</p><p>Here is a question. If we are going to have html in wysiwyg editor, should the csb output be html? Thay will make moving content around easier, but it will make the exported content less human readable.</p><p>add another paragraph in the excel file.<p><p>This one uses a br <br> for a line break</p>.

when it's imported, it looks like this:

image

This raises another question.

Do you want HTML in text area fields in the data dictionary?

Advantage is that you have some formatting abilities. Disadvantage is that when you import, you either need to go in after and format or your input has to have html. <br> would work for a line break.

Currently, when you download a data dictionary, text areas include the html. Whether that's good or not depends what you are doing with the spreadsheet. If your are importing into a system that understands html, this is good. Otherwise it makes it less readable for people.

If you want to keep HTML, I think there is a way to strip it from the export, but you would still need at least the <br> on import of you want line breaks.

@lkmorlan any thoughts on this?

lkmorlan commented 3 months ago

We could write an import filter to covert linebreaks to br elements.

NicoledeGreef commented 3 months ago

@lkmorlan - can we do a filter for the import and strip the HTML from the export?

lkmorlan commented 3 months ago

If you don't want HTML, the type of the field should be switched to plain text. Or do you want HTML on the system but not in exports?

NicoledeGreef commented 3 months ago

Let's go with: if there are breaks in the original import, insert <br> tags. Leave as HTML for export.

CraigClark commented 2 months ago

Tested and passed on dv14