EBISPOT / gwas-sumstats-service

Summary statistics service
2 stars 0 forks source link

Sporadic Incorrect File Extension Detection #317

Open karatugo opened 5 months ago

karatugo commented 5 months ago

Description: The get_ext method intermittently fails to correctly determine the file extensions during the get_submitted_files process. This issue appears to arise from the method's reliance on magic.Magic to derive file type descriptions, which may not always be accurate or specific enough to deduce the correct file extension.

Steps to Reproduce:

  1. Submit a file through get_submitted_files.
  2. The file reaches get_ext method.
  3. Observe that the output sometimes incorrectly identifies or appends the file extension based on its content description.

Expected Behavior: The get_ext method should consistently and accurately determine the correct file extension based on the file's actual type and content.

Actual Behavior: The method occasionally assigns incorrect file extensions, particularly when the magic.Magic description is too generic or misinterprets the file’s content (e.g., misidentifying text files as gzip due to content encoding).

Possible Solution: Consider using a combination of MIME type detection and actual file extensions to enhance accuracy. Adjusting the magic configuration to yield more precise results or directly parsing file extensions could mitigate this issue.

Additional Information: This error does not occur for all files but has been noted sporadically across different file types, complicating the debugging process. A review of how file descriptions are parsed and used in extension determination might be necessary.