enasequence / sequencetools

Webin sequence validation API.
Apache License 2.0
10 stars 3 forks source link

Date validator #19

Closed sachalau closed 1 year ago

sachalau commented 7 years ago

Hello,

The date validator seems to fail me on every instance, 24-Nov-2016 or 08-Aug-2017 or 2017 or Aug-2017 with the latest version 1.1.173. My fix was to leave the date empty in :

RL Submitted () to the INSDC.

but could you tell me what are the rules and the format for the date to pass the validation ?

Thanks,

kethireddy commented 7 years ago

Hi,

Could you try RL line in following format :

Eg: RL Submitted (02-MAY-2014) to the INSDC.

Sequence formats are given here (with example flat files) :

http://www.ebi.ac.uk/ena/submit/sequence-format

Regards, Kethi

sachalau commented 7 years ago

No, still not working.

For instance, even the example flat files given in your link fail the validator :

http://www.ebi.ac.uk/ena/data/view/AACH01000026%26display%3Dtext

Even when I try putting dates in the future (I think I read somewhere that they should be in the future ? Not sure)

ERROR: Invalid date: 24-MAY-2003 (FF.2) line: 8 ERROR: Invalid date: 16-AUG-2014 (FF.2) line: 8 ERROR: Invalid date: 07-APR-2018 (FF.2) line: 35

Thank you for your fast answer

kethireddy commented 7 years ago

I don't see any error messages when i used latest version of validator (https://mvnrepository.com/artifact/uk.ac.ebi.ena.sequence/embl-api-validator/1.1.173) for the given file

Command : java -cp embl-api-validator-1.1.173.jar uk.ac.ebi.embl.api.validation.EnaValidator AACH01000026.txt

Output :

Messages WARNING: Sequence contains a stretch of n characters between base 9,497 and 9,497 that is not represented with a "gap" feature (stretches of n greater than 0 gives a warning, greater than 10 gives an error). (SequenceToGapFeatureBasesCheck-1)

FILE SUMMARY bad.txt - 1 entries, 0 failed entries, 0 errors, 0 fixes, 1 warnings & info

SUMMARY Fixed Entries:0 Failed Entries:0 Checked Entries:1 Unchanged Entries:1

sachalau commented 7 years ago

Which java are you using ? This commands fails for me for openjdk 8 and openjdk 9.

kethireddy commented 7 years ago

java -version openjdk version "1.8.0_91" OpenJDK Runtime Environment (build 1.8.0_91-b14) OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)

sachalau commented 7 years ago

I'm using a more recent version of java

openjdk version "1.8.0_131" OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11) OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)

I have seen that this line seems to create a lot of errors in users (see https://github.com/enasequence/sequencetools/issues/18 )

Are you going to address this problem in future versions of the validator ?

raskoleinonen commented 7 years ago

Thank you. We will test examples from ftp://ftp.ebi.ac.uk/pub/databases/embl/doc/usrman.txt against the parser. Also, we will investigate if we can remove the requirement to have any references (R* lines) present in the flat files.

kethireddy commented 7 years ago

next version of validator will address Issue : https://github.com/enasequence/sequencetools/issues/18

abretaud commented 6 years ago

I still have this problem with 1.1.179

kethireddy commented 6 years ago

Hi, Could you please provide the error message. Thanks, Kethi

abretaud commented 6 years ago

Sure, I get this error:

ERROR: Invalid date: 10-OCT-2017 line: 26

and in the summary:

ERROR: Invalid date: {0} (13314 occurrences) (FF.2)

The original EMBL file looks like this:

RL Submitted (10-OCT-2017) to the INSDC.

And I run the validator like this:

java -jar ~/embl-api-validator-1.1.179.jar -fix -r input.embl

wna-se commented 1 year ago

This is an old issue but may be related to #132

raskoleinonen commented 1 year ago

We suspect a locale-related problem. A known locale problem affecting dates has been fixed by using DateTimeFormatter (threadsafe) and setting the locale to UK: https://github.com/enasequence/sequencetools/commit/dfbc6cd10083b6fa109bb3aec977ef13d6cec0b5