DSpace / DSpace

(Official) The DSpace digital asset management system that powers your Institutional Repository
https://wiki.lyrasis.org/display/DSDOC8x/
BSD 3-Clause "New" or "Revised" License
901 stars 1.32k forks source link

[DS-2174] MetadataExport exports empty languages i.e. dc.title[] #5540

Open dspace-bot opened 10 years ago

dspace-bot commented 10 years ago

Imported from JIRA [DS-2174] created by peterdietz

Metadata Export will give you a CSV will headers of metadata keys, and the body of the csv is the values. If your metadata key happens to have a language that is not null, but no value, i.e. you didn't specify en or en_US, sometimes this export will give you dc.date.submitted[].. An empty language, why not just export as dc.date.submitted

Yeah, so there's a bug in the MetadataExport DSpaceCSV.java, its possible to sometimes get empty language "[]" because it only checks if language is null, not also if the language is empty. (The proper behavior was commented out...)

dspace-bot commented 10 years ago

peterdietz said:

Note: It's not sufficient to just remove empty square brackets from the output, you'll also need to clean up the actual state of metadata languages, so that there is no distinction between language value of empty string or null. So, wherever values are being set will need to account for that as well.

dspace-bot commented 7 years ago

alexm said:

As I use CSV-exported metadata regularly, I'd like to have this issue solved. I see there's a declined pull request, but I'm not sure I understand the reason. Is it because MetadataImport should also be fixed or because all places where a metadata is stored with an empty language should be fixed too?

In any case, unless there's some tricky stuff better dealt with by someone more experienced, if you want, I could work on it.

dspace-bot commented 6 years ago

alaw said:

Our institution would also really appreciate a resolution to this issue. We use exported metadata .csv extensively and this bug doubles the number of fields in each spreadsheet. Manipulating a spreadsheet of 15 columns is much easier than dealing with one with 28 columns!. We would really appreciate a resolution to this.

dspace-bot commented 6 years ago

tdonohue said:

Just a brief update on this ticket, we are still looking for volunteers to work on fixing this bug. 

The previous Pull Request (https://github.com/DSpace/DSpace/pull/674) was rejected/closed (by the creator of the PR) as it was discovered (after further testing) to not solve the problem.  So, if anyone is interested in submitting a new Pull Request, we'd welcome volunteers (and can help find testers).

dspace-bot commented 5 years ago

helen.baer said:

Our institution would also appreciate a fix. We're going to be doing some metadata remediation later this year, and having the extra columns as Anne describes will definitely slow us down.

alanorth commented 1 year ago

For what it's worth, I've resorted to normalizing NULL, blank (literal empty string ""), and en to en_US straight in SQL before doing large metadata exports:

BEGIN; -- Start a transaction, just in case we need to ROLLBACK!

-- This only updates `text_lang` for DSpace items, not communities, collections, or
-- any other objects, and only for items that are in the archive, not withdrawn!
UPDATE metadatavalue
SET text_lang='en_US'
WHERE dspace_object_id IN
    (SELECT UUID
     FROM item
     WHERE in_archive
       AND NOT withdrawn)
  AND text_lang IS NULL
  OR text_lang IN ('en',
                    '');

COMMIT; -- Once we're sure we didn't make a mistake!

Note: our repository has metadata values that are legitimately set to French, Vietnamese, Arabic, etc so I only do the blank, null, and "en" ones.