MPDL / dataverse

Open source research data repository software
http://dataverse.org
Other
1 stars 0 forks source link

Error in Redetect File Type API #40

Open helkv opened 2 years ago

helkv commented 2 years ago

Executing the Redetect File Type API-Call (https://guides.dataverse.org/en/latest/api/native-api.html#redetect-file-type) results in multiple Exceptions, if the original filetype is text/plain and S3 is used as File-Storage. The result is an incorrect filetype detection and the file being removed from the index.

Equivalent Issue from IQSS/Dataverse: https://github.com/IQSS/dataverse/issues/7527 & https://github.com/IQSS/dataverse/pull/7631

Server: All instances Date of Test: 06.07.2022 Browser: - Version: i.a. Dataverse v. 5.10.1 / v. 5.11 User: -

Preconditions:

Actions:

  1. Call the Redetect File Type API: {{base_url}}/api/files/64/redetect?dryRun=false
  2. Result:
    • Incorrect result of the API-Call
    • The file is missing in the Solr index

Root of the Error:

  1. The File Type is checked on a temporary File (.tmp) when using S3
  2. Method FileUtil.determineFileTypeByExtension() returns null for .tmp-File with text/plain as File Type
  3. JPA/EJB Exceptions because Type of a File must not be null
  4. Indexing of the File fails because of the preceding errors (NPE while indexing the Dataset)

Server.log with the related Errors: Server_Log_with_Redetect_File_Type_Errors.log