Open dspace-bot opened 10 years ago
peterdietz said:
Note: It's not sufficient to just remove empty square brackets from the output, you'll also need to clean up the actual state of metadata languages, so that there is no distinction between language value of empty string or null. So, wherever values are being set will need to account for that as well.
alexm said:
As I use CSV-exported metadata regularly, I'd like to have this issue solved. I see there's a declined pull request, but I'm not sure I understand the reason. Is it because MetadataImport should also be fixed or because all places where a metadata is stored with an empty language should be fixed too?
In any case, unless there's some tricky stuff better dealt with by someone more experienced, if you want, I could work on it.
alaw said:
Our institution would also really appreciate a resolution to this issue. We use exported metadata .csv extensively and this bug doubles the number of fields in each spreadsheet. Manipulating a spreadsheet of 15 columns is much easier than dealing with one with 28 columns!. We would really appreciate a resolution to this.
tdonohue said:
Just a brief update on this ticket, we are still looking for volunteers to work on fixing this bug.
The previous Pull Request (https://github.com/DSpace/DSpace/pull/674) was rejected/closed (by the creator of the PR) as it was discovered (after further testing) to not solve the problem. So, if anyone is interested in submitting a new Pull Request, we'd welcome volunteers (and can help find testers).
helen.baer said:
Our institution would also appreciate a fix. We're going to be doing some metadata remediation later this year, and having the extra columns as Anne describes will definitely slow us down.
For what it's worth, I've resorted to normalizing NULL
, blank (literal empty string ""), and en
to en_US
straight in SQL before doing large metadata exports:
BEGIN; -- Start a transaction, just in case we need to ROLLBACK!
-- This only updates `text_lang` for DSpace items, not communities, collections, or
-- any other objects, and only for items that are in the archive, not withdrawn!
UPDATE metadatavalue
SET text_lang='en_US'
WHERE dspace_object_id IN
(SELECT UUID
FROM item
WHERE in_archive
AND NOT withdrawn)
AND text_lang IS NULL
OR text_lang IN ('en',
'');
COMMIT; -- Once we're sure we didn't make a mistake!
Note: our repository has metadata values that are legitimately set to French, Vietnamese, Arabic, etc so I only do the blank, null, and "en" ones.
Imported from JIRA [DS-2174] created by peterdietz
Metadata Export will give you a CSV will headers of metadata keys, and the body of the csv is the values. If your metadata key happens to have a language that is not null, but no value, i.e. you didn't specify en or en_US, sometimes this export will give you dc.date.submitted[].. An empty language, why not just export as dc.date.submitted
Yeah, so there's a bug in the MetadataExport DSpaceCSV.java, its possible to sometimes get empty language "[]" because it only checks if language is null, not also if the language is empty. (The proper behavior was commented out...)