enasequence / webin-cli

Webin command line submission program.
Apache License 2.0
30 stars 5 forks source link

ERROR: master file validation failed for master.dat #37

Open sejmodha opened 3 years ago

sejmodha commented 3 years ago

Hi There,

I am trying to submit TPA metagenomic assemblies using webin-cli-3.4.0.jar. Most of my files are being validated just fine but for a small proportion of files, I get the following ambiguous error and I am not sure how to resolve it.

2021-01-14T11:49:46 ERROR: Submission validation failed because of a user error. Please check validation reports for further information: /../ENA_submission/genome/UC_ERR262953/validate
uk.ac.ebi.ena.webin.cli.WebinCliException: Submission validation failed because of a user error. Please check validation reports for further information: /../ENA_submission/genome/UC_ERR262953/validate
    at uk.ac.ebi.ena.webin.cli.WebinCliException.validationError(WebinCliException.java:72)
    at uk.ac.ebi.ena.webin.cli.WebinCli.validate(WebinCli.java:199)
    at uk.ac.ebi.ena.webin.cli.WebinCli.execute(WebinCli.java:171)
    at uk.ac.ebi.ena.webin.cli.WebinCli.__main(WebinCli.java:85)
    at uk.ac.ebi.ena.webin.cli.WebinCli.main(WebinCli.java:66)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:48)
    at org.springframework.boot.loader.Launcher.launch(Launcher.java:87)
    at org.springframework.boot.loader.Launcher.launch(Launcher.java:50)
    at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:51)
Caused by: uk.ac.ebi.ena.webin.cli.WebinCliException:
    at uk.ac.ebi.ena.webin.cli.WebinCliException.validationError(WebinCliException.java:84)
    at uk.ac.ebi.ena.webin.cli.WebinCliExecutor.validateSubmission(WebinCliExecutor.java:109)
    at uk.ac.ebi.ena.webin.cli.WebinCli.validate(WebinCli.java:187)
    ... 11 common frames omitted

../validate/webin-cli.report has only one line of description related to this error that I am unable to Google or resolve!

ERROR: master file validation failed for master.dat

Any help with it would be greatly appreciated.

raskoleinonen commented 3 years ago

Dear Colleague,

I agree that this is a very unhelpful error. We will need to investigate this and will need to ask you to upload the manifest file and the data files to webin2.ebi.ac.uk using FTPS. Unfortunately, we are presently desperately short of capacity in this area and may not be able to address this immediately.

snathanvj commented 3 years ago

Dear Colleague, We are planning to make changes to report proper validation messages instead of a generic message like the one you mentioned above. Could you please upload the files to webin2.ebi.ac.uk using FTPS and let us know , so that we can make a fix for this.

Thanks, Senthil

sejmodha commented 3 years ago

Hi Senthil,

Thanks for getting back to me on this. Could you please advise on how to submit the manifest and fasta files to webin2.ebi.ac.uk?

snathanvj commented 3 years ago

Hi, Thanks for your response. This page has the clear instruction about file uploading procedure https://ena-docs.readthedocs.io/en/latest/submit/fileprep/upload.html

Please let me know , if you need any help. Meantime, if you paste the manifest file content here , I can try to figure out the issue as it is master file related issue ,it could come from sample/project.

Thanks, Senthil

sejmodha commented 3 years ago

Hi Senthil,

Thanks for your help. There are 58 samples for which I get the same 'master' file error. Most of these samples are from BioProject PRJEB1775. Please find an example manifest file below.

STUDY   PRJEB41812
SAMPLE  ERS234275
ASSEMBLY_TYPE   primary metagenome
ASSEMBLYNAME    UC_ERR260488
COVERAGE    1
PLATFORM    ILLUMINA
PROGRAM SPAdes genome assembler v3.11.1
MOLECULETYPE    genomic DNA
DESCRIPTION This sample represents unknown contigs derived from the metagenomic sample ERS234275
TPA true
RUN_REF ERR260488
FASTA   ENA_submission/fasta/ERR260488_dark_contigs.fasta.gz

Here is a list of samples with corresponding BioProjects that also fail with the same error.

BioProject SRARunID
PRJEB17784 ERR1913234
PRJEB17784 ERR1913290
PRJEB17784 ERR1913317
PRJEB8094 ERR719439
PRJEB10133 ERR981250
PRJEB21446 ERR2014384
PRJEB21446 ERR2014382
PRJEB15257 ERR1600435
PRJEB15257 ERR1600430
PRJEB15257 ERR1600431
PRJEB12831 ERR1681529
PRJEB1775 ERR260489
PRJEB1775 ERR262952
PRJEB1775 ERR260483
PRJEB1775 ERR262940
PRJEB1775 ERR262946
PRJEB1775 ERR260484
PRJEB1775 ERR260493
PRJEB1775 ERR262945
PRJEB1775 ERR260501
PRJEB1775 ERR262948
PRJEB1775 ERR260477
PRJEB1775 ERR260476
PRJEB1775 ERR260487
PRJEB1775 ERR260494
PRJEB1775 ERR262953
PRJEB1775 ERR260488
PRJEB1775 ERR262957
PRJEB1775 ERR262941
PRJEB1775 ERR260504
PRJEB1775 ERR262944
PRJEB1775 ERR260479
PRJEB1775 ERR262938
PRJEB1775 ERR260490
PRJEB1775 ERR262942
PRJEB1775 ERR260492
PRJEB1775 ERR260478
PRJEB1775 ERR262954
PRJEB1775 ERR260482
PRJEB1775 ERR260485
PRJEB1775 ERR260499
PRJEB1775 ERR260481
PRJEB1775 ERR262956
PRJEB1775 ERR262951
PRJEB1775 ERR262958
PRJEB1775 ERR262939
PRJEB1775 ERR260497
PRJEB1775 ERR260498
PRJEB1775 ERR260496
PRJEB1775 ERR262949
PRJEB1775 ERR260502
PRJEB1775 ERR260495
PRJEB1775 ERR260503
PRJEB1775 ERR262947
PRJEB1775 ERR262950
PRJEB1775 ERR260506
PRJEB1775 ERR260491
PRJEB1775 ERR260486

Thanks for your help.

snathanvj commented 3 years ago

Hi, As per the sample the scientific name is 'Bacteria' , I am not sure it is a valid scientific name as the rank is superkingdom , I think it should be one of the species scientific name.

[ { "taxId" : "2", "scientificName" : "Bacteria", "commonName" : "eubacteria", "formalName" : "false", "rank" : "superkingdom", "division" : "PRO", "geneticCode" : "11", "submittable" : "false" } ]

sejmodha commented 3 years ago

Hi,

I am a bit confused about the suggestion.

Just to confirm, I am submitting these as TPA; primary metagenome assembly, and cannot change the existing scientific name as I do not own the original sample.

snathanvj commented 3 years ago

Hi , I am not sure about this, it may be a valid value, in this case I have to make a code change to allow this, I will discuss with @raskoleinonen and come back to you . Thanks.

sholt6 commented 3 years ago

Hi @sejmodha,

Sam here from the ENA Helpdesk team, the error to do with your sample is more a submission issue than a technical problem with Webin-CLI, error message content notwithstanding. Could you please make a ticket with us and we'll help you get your metagenome assembly submitted: https://www.ebi.ac.uk/ena/browser/support

Thanks! Sam

sejmodha commented 3 years ago

Hi Sam,

Thanks for getting in touch.

I already have a ticket number: [ENA METAGENOME #466589] [ena-metagenome] Third Party Assembly submission where I raised this issue with an ENA colleague. Should I raise a separate ticket with a link to this GitHub discussion?

Thanks for your help.

sholt6 commented 3 years ago

Great - no need to raise another ticket, I'll make sure the person handling ticket #466589 is aware of this thread

Thanks! Sam

raskoleinonen commented 3 years ago

A new version of Webin-CLI has been deployed with better sample related error reporting: https://github.com/enasequence/webin-cli/releases/tag/v3.7.0