Closed stevenferey closed 1 year ago
Hello,
To contribute a PR, are you interested in one of the proposals mentioned in the description please?
Thank you so much, Steven.
Yes, this definitely is a problem that needs to be fixed. (in other words, no, the current behavior is not correct).
There was some rationale behind the original decision to use 2 different mime types for ingested and uningested tab-delimited files. As a way to fix this, I would strongly prefer not to touch either of the 2 type definition files above, and instead simply make the redetect API skip ingested files. There shouldn't be any practical case where redetecting the type of an ingested file could be necessary or useful.
In other words, I would fix this by simply adding a if (!dataFileIn.isTabularData())
to the redetectDatafile()
method in Files.java
.
Apologies for overlooking this issue when you opened it originally. Thank you for bringing this to our attention and for offering to make a PR.
Issue created by the "entrepot.recherche.data.gouv.fr" team
What steps does it take to reproduce the issue?
Use API redetect File Type with an ingested file (dryRun to False or True)
Which page(s) does it occur on?
API resource
What happens?
In Dataverse, a tabular file that is ingested produces a .tab file with a mimetype=text/tab-separated-values
running the redetect File Type API on a .tab file changes its mimetype to text/tsv as it is declared in the mime.types file:
https://github.com/IQSS/dataverse/blob/1a797171cdb73741b5da4a683f38697558349b5c/src/main/java/META-INF/mime.types#L9-L10
API return:
{"status":"OK","data":{"dryRun":true,"oldContentType":"text/tab-separated-values","newContentType":"text/tsv"}}
Is this the right behavior ?
To whom does it occur (all users, curators, superusers)?
all users
What did you expect to happen?
I think there can be two solutions:
First:
In the mime.types file, edit the entry
by
Second :
In the mime.types file, edit the entry
by
And add the mimetype in the MimeTypeDetectionByFileExtension.properties file:
tab=text/tab-separated-values
If you want to keep the tab-separated-values mimetype for ingested files, it's better not to be able to change it with MimeTypeDetectionByFileExtension.properties
Which version of Dataverse are you using?
5.12.1
Any related open or closed issues to this bug report?