TripalCultivate / TripalCultivate-Phenotypes

Provides generic support for large scale phenotypic data and traits with importers, content pages and visualizations.
GNU General Public License v3.0
1 stars 0 forks source link

Fixes failed test: pdf as tsv #64

Closed reynoldtan closed 10 months ago

reynoldtan commented 11 months ago

**Issue #59 - PDF as tsv file

Motivation

Data file fails to trigger validation when pdf file converted into a tsv file is uploaded.

What does this PR do?

Please describe each things this PR does. For example, a PR may 1) solve a specific bug, 2) create an auomated test to ensure it doesn't return.

  1. Add additional check to inspect the file content for pdf signatures and trigger the correct validation error..

Testing

Below is a tsv file from a pdf file where the pdf extension was replace with tsv. sample.txt

Provide this file into the importer file field and validate to trigger an incorrect file format/extension error.

reynoldtan commented 11 months ago

The fix to this issue is very specific to pdf file. It needs to be a check of file content to have content structure (by tabs or by comma) and infer the validity of the file. I will revise the rule for this one

carolyncaron commented 11 months ago

We already discussed this at our latest meeting, but to summarize it here: we think this approach of checking for a disguised pdf is a bit too specific. A broader approach would be to validate that the file is tab-delimited by parsing the first line. Ideally, we would go so far as to implement a method in the Tripal Importer class that can do checks for tab-delimited text and the number of columns, since this functionality is needed by most (maybe all) importers written for Tripal.