DKMS-LSL / typeloader

Other
8 stars 1 forks source link

ERROR: ENA rejected your files (validation failed) #24

Closed steletvinicius closed 2 years ago

steletvinicius commented 2 years ago

I have got two issues when trying to communicate with ENA database:

The first one happened when I tried to create new projects. After filling the new project form and clicking on "Click to generate" and Start new project", Typeloader did not responded with the ENA project Number.

Looking for a workaround, I asked ENA website for a password RESET, copied and pasted suspected the new password on TypeLoader, repeated the project creation and It worked perfectly. (i have created three new projects).

However, when I tried to submit the new sequences to ENA database, I have got the following error message:

====================================== ERROR: ENA rejected your files (validation failed):

The complete submission has been rejected.

image

Just to check the password validation, I tried to log in on the website https://www.ebi.ac.uk/ena/submit/webin/ and I was able to access my ENA account.

image

Am I doing something wrong? : -(

bmschoene commented 2 years ago

But you have used TypeLoader successfully before, right?

When did this stop working? Did you change anything on your side between "works" and "doesn't work anymore"?

steletvinicius commented 2 years ago

Yes, it was working perfectly. I have reinstalled Typeloader recently after a windows reinstallation on my machine.

Since this reinstallation, It is the first attempt to submit new sequences.

bmschoene commented 2 years ago

Huh. I assume you have checked that your ENA credentials are correct in TypeLoader? (Settings => Company)

This sounds like a problem with ENA communication, but it's hard to debug from here.

There will be a new TypeLoader version by the end of next week or in early November. I'll make sure it works for the "fresh installation" usecase.

Would it be ok for you to wait for this release, try a fresh TypeLoader install with it and then see whether this fixes the problem, before we try to dig into this?

steletvinicius commented 2 years ago

yes, I have my login credentials by logining in on https://www.ebi.ac.uk/ena/submit/webin/ I have also changed my password again, updated the information on typeloader. Problem NOT solved. I have also tried (1) to update the webin client and (2) reinstall/update Typeloader. Problem NOT solved.

What is not so clear is why I can create new projects but not submit new sequences? If it is a login credentials issue, project creation would also be affect, wouldn't it?

Yes, I can and I will wait for the next release.

steletvinicius commented 2 years ago

Checking if I can create a new project. OK!

image

image

bmschoene commented 2 years ago

Both processes use the same credentials, you're correct. So I don't think it's a problem of your ENA credentials.

But both processes happen through different channels - the project creation is a direct call from TypeLoader to the ENA server, whereas the sequence submission happens through a java tool called webin-cli that is provided by ENA and built into TypeLoader. Basically, TypeLoader prepares all the data and then just calls webin-cli on it, and webin-cli validates the data and does the communication with the ENA server if validation passes.

Apparently, in your case, webin-cli is unhappy with some of the data TypeLoader prepared for it ("validation failed"). There is probably more information in the files that webin-cli generates, and I'd ask for those to dig into your issue. But since the new TypeLoader release will contain a new major version of webin-cli, I think it makes more sense to just wait for that and see what happens then.

Sorry for all the hassle!

steletvinicius commented 2 years ago

No reason to be sorry. Absolutelly. I really appreciate your efforts to make such a valuable tool available to the HLA community.

I hope you are right and the update solves this problem. As I told I tried to change the webin-cli executable by downloading the most recent version (https://github.com/enasequence/webin-cli/releases/tag/v4.2.1) by putting the executable on this folder C:\Program Files (x86)\TypeLoader\ENA_Webin_CLI (not sure if this is enough...) And the problem persists.

image

Any chance ENA have made some changes on validation criteria still not implemented on my Typeloader version?

bmschoene commented 2 years ago

Yes, they have made some changes, and they look more profound than I anticipated.

The Webin CLI version that should work with your TypeLoader is 3.1.0, and it should continue to work until end of October. It does at our location. (You have found the correct location in your TypeLoader. If you delete the 4.1.2 version and save the 3.1.0 version there, do you still get the same error?)

The new webin CLI version (or any 4.x version) is what's required starting November 1st. This will require adjustments on TypeLoader to make sure the generated files pass validation, because they have indeed changed some criteria. So webin CLI 4.1.2 cannot work with your TypeLoader version.

However, I have stumbled into unexpected difficulties with this new webin CLI version and currently cannot get it to work. (see #25). I have opened a ticket at ENA, but since they currently have official reply times of 3 weeks, it will probably be a while until I can get the necessary help to get the new TypeLoader version running. :(

steletvinicius commented 2 years ago

The Webin CLI version that should work with your TypeLoader is 3.1.0, and it should continue to work until end of October. It does at our location. (You have found the correct location in your TypeLoader. If you delete the 4.1.2 version and save the 3.1.0 version there, do you still get the same error?) Yes, my first attempts before openning this issue was running the "native" Webin CLI that comes with TypeLoader instalation.

Today, I have tried to submit the sequences again changing only the Webin CLI versions (3.1.0; 3.2.1; 3.2.2; 3.3.3; 3.4.0; 3.4.1; 3.5.0; 3.7.0) unfortunatelly with NO success.

I have no experience on manual submissions to ENA. Before Typeloader, I have done them on GenBank BankiT platformr. Is there a way to submit to ENA the files generate by Typeloader MANUALLY using some web platform?

bmschoene commented 2 years ago

I don't know. As far as I know, they only accept submissions via webin CLI anymore, but I haven't looked. Sorry.

I'm still digging into the problem with the new webin CLI and may have an idea how to solve it. So there still may be a new TypeLoader in early November.

bmschoene commented 2 years ago

The new release is out! https://github.com/DKMS-LSL/typeloader/releases/tag/V2.11.0

Please let me know whether this solves your problem, yes?

Sorry for the long wait.

steletvinicius commented 2 years ago

I have been waiting for it. After updating, I have tried to submit the file to ENA and I have got a new message before click on submit:

Screenshot_3

And a new error message AFTER clicking on submit:

Screenshot_4

It looks like some enconding error... :-(

I have tried to delete this allele, reimport and submit. Nothing changed. The error persists. Is It related to something I am supposed to update on my PC?

(https://stackoverflow.com/questions/5552555/unicodedecodeerror-invalid-continuation-byte)

bmschoene commented 2 years ago

The new "this will go to the prod server" warning message is intentional. It's an additional safeguard to make sure users don't accidentally submit stuff meant for the test server to the productive system.

The encoding error I haven't seen before. Can you send me a logfile, please, so I can investigate at which point it is thrown? (Options => Download logfile)

Ideally, from a fresh TypeLoader session where you do nothing but try to submit one of these alleles to ENA.

Even more helpful would be if you could additionally send the ENA submission files from that submission. To get these, open the Project View of the project you were trying to submit, then click "Download files" in the upper right area. Under "Choose files" you should get something like this:

image

I need the bottom-most _flatfile.txt.gz and _manifest.txt file (because these always belong to the latest submission attempt).

(This whole thing might prove hard to debug from here, as it's possibly a language specific problem. But I'll do my best!)

steletvinicius commented 2 years ago

First of all, thank you for your efforts and sorry for giving you such a hard time.

I hope these are the files you requested and please let me know if I can help to find the solution on some other way.

SteletVinicius_TypeLoader_20211108_124529.log SteletVinicius_PRJEB36065_20211108124648_manifest_failed.txt SteletVinicius_PRJEB36065_20211108124648_flatfile_failed.txt.gz

bmschoene commented 2 years ago

Seems like there are 2 issues at work here. The UnicodeError has nothing to do with ENA submission. I can fix that with ease.

However, your ENA validation still failed. But on my computer, your files validate just fine. 🤔

Can you please add the webin-cli.report file from the same directory? This should hopefully have more details why the validation falied.

bmschoene commented 2 years ago

(ENA seems to have problems right now, so I cannot really test the new changes. They had announced a maintenance window for tomorrow, but they seem to have started early...)

steletvinicius commented 2 years ago

Here is the file you requested. webin-cli.zip

I believe it is not going to help because de last modification date is 02/09/2021. This is the only file matching this name on the project folder.

bmschoene commented 2 years ago

You're right, that file belongs to an earlier submission, which went through.

Hm. On my end, the connection to ENA is working again, so maybe it was a temporary hickup instead of early maintenance.

Can you please try submitting to ENA again, and then send me the newly created webin-cli.report file? (This one is created by webin-cli and overwritten during every submission attempt.)

steletvinicius commented 2 years ago

PRJEB36065_20211109100939_manifest_failed.txt PRJEB36065_20211109100939_flatfile_failed.txt.gz 20211109_100905.log

I have faced the same error message (unicodeDecodeError) on this new submission attempt.

Interestingly, webin-cli.report file shows no change even after this new attempt. webin-cli.zip

And it seems to be normal for unsuccessful submissions. As you can see on the print screen below, I have made many failed submission attempts after February, 09. image

bmschoene commented 2 years ago

Huh. So your failed submission doesn't reach a stage where a webin-cli.report is created... 🤔

I'll build you an installer with a beta-version that has the unicode-bug fixed. I have also just implemented that after ENA submission, the webin-cli.report file is renamed with the same prefix as the other submission files. This way, the file doesn't get overwritten. (This has been bugging me occasionally for a while now, anyway.)

steletvinicius commented 2 years ago

Great! Let me know as soon as this beta becomes available so I can run it and check if the problem is solved.

bmschoene commented 2 years ago

Here's your beta installer (you have to unzip it before using it). Just use the "update an existing installation => yes" option.

As a note: ENA has announced a maintenance downtime for today 10 AM to 3 PM GMT, so during that time window, ENA submissions won't work anyway. (You should probably see whether validation goes through, though. The error message if ENA can't be reached is a different one from the previous and happens later.)

Let me know how it goes.

steletvinicius commented 2 years ago

Good news: the Encoding error has gone. Bad news : The validation error remains...

" ERROR: ENA rejected your files (validation failed): The complete submission has been rejected. " Does it can be related to the maintenance time?

I have some new alleles whose submissions are pending and I have tried to submit the different alleles from different projects . All failed.

Here are the log files. No webin-cli.report file was generated.

PRJEB48196_20211110095107_flatfile_failed.txt.gz PRJEB48196_20211110095107_manifest_failed.txt

PRJEB48203_20211110095047_flatfile_failed.txt.gz PRJEB48203_20211110095047_manifest_failed.txt

PRJEB36065_20211110094915_manifest_failed.txt PRJEB36065_20211110094915_flatfile_failed.txt.gz

20211110_094735.log

bmschoene commented 2 years ago

This is weird. The files validate just fine from here, and your paths look good.

Can you send me the corresponding _flatfile.txt.gz.report files?

bmschoene commented 2 years ago

Your TypeLoader data path is located in your Windows user's directory. This (1864447) is the Windows user you're using TypeLoader from, yes? Not a remnant from the move? Just checking.

Can you verify in your explorer that the files you sent me are located in C:\Users\1864447\Documents\AlleleSubmissions_Typeloader\VINICIUS\projects\20200107_VS_HLA-B_ExonicNovelties-Barretos (for the last submission attempt)?

steletvinicius commented 2 years ago

_Can you send me the corresponding flatfile.txt.gz.report files? Are these files different from the ones I have sent on the last time?

Your TypeLoader data path is located in your Windows user's directory. This (1864447) is the Windows user you're using TypeLoader from, yes? Not a remnant from the move? Just checking.

Yes. 1864447 is my company ID and I am supposed to use it to log in on Windows.

Regardin the file path, I believe is correct as you can see on the print screen below (the blue selection at the top)

image

Considering the second and the fifth columns ("Data de modificação" and Data da Criação ) as "Modificated at" and "Created at" respectively and the date format in Portuguese as DD%MM%YYYY, the files at the bottom are the ones shared on my last post.

It is always good to check, but the date information on the file name 20211110094915 corresponds to my last submittion attempt. So I believe there is no chance these files comes from a remaining TypeLoader installation, right? But no problem to check at all...

As a triple check, I tried to submit again and here is the print screen from the same folder on the same Windows explorer session:

image

PRJEB36065_20211111103509_flatfile_failed.txt.gz PRJEB36065_20211111103509_manifest_failed.txt

I have checked the folder C:\Program Files (x86)\TypeLoader\ENA_Webin_CLI and I have one question? Does the TypeLoader beta installer you provided yesterday update the webin-cli.jar file? It seems it doesn't. Which version Am I supposed to run?

image

(I have a feeling here is the problem and the solution...)

steletvinicius commented 2 years ago

image

Maybe not... : -(

bmschoene commented 2 years ago

4.2.1 ist the correct Webin CLI version. The installer probably only updates files if they have changed, and this one hasn't changed between the 2.11.0 and 2.11.0.1 beta release.

Okay, let's try this:

Open a command line terminal (probably, typing cmd into your Windows program search should do that) and paste the following command:

java -jar "C:\Program Files (x86)\TypeLoader\ENA_Webin_CLI\webin-cli-4.2.1.jar" -context sequence -manifest C:\Users\1864447\Documents\AlleleSubmissions_Typeloader\VINICIUS\projects\20200107_VS_HLA-B_ExonicNovelties-Barretos\PRJEB36065_20211108124648_manifest.txt -userName **** -password **** -centerName "INSTITUTO NACIONAL DE CANCER JOSE ALENCAR GOMES DA SILVA" -inputDir "C:\Users\1864447\Documents\AlleleSubmissions_Typeloader\VINICIUS\projects\20200107_VS_HLA-B_ExonicNovelties-Barretos" -outputDir "C:\Users\1864447\Documents\AlleleSubmissions_Typeloader\VINICIUS\projects\20200107_VS_HLA-B_ExonicNovelties-Barretos" -validate

(But you have to replace both **** fields with your ENA FTP user and ENA FTP password)

This is the command that TypeLoader actually calls.

Let me know what that results in?

steletvinicius commented 2 years ago

image

ERROR: Unable to read the manifest file

steletvinicius commented 2 years ago

I think I got it. There is no file with named like that So, I changed the filename ont he command line to target an existing manifest file. Is that what you mean?

image

image

Here are the files generated.

webin-cli_report_AND_manifest_failed_report.zip

bmschoene commented 2 years ago

Ah. Sorry. TypeLoader renames the manifest file and testfile after a failed attempt, to easier identify them later.

You need to rename both the manifest and the flatfile of this submission to the names given in the command (= delete the _failed part of the filenames). Just changing the filenames in the command will not be sufficient, because the command contains only the manifest file, and the manifest file contains the name of the original flatfile (without _failed).

Then you should be able to run my command as it is.

(If you prefer, you can copy them instead of renaming, but keep the copies within this folder, please.)

steletvinicius commented 2 years ago

Renamed image

Runned image

(And it seems good, isn't?)

And here are the files created after that: 20211112_submissionattemp_report.zip

bmschoene commented 2 years ago

Yes, it does. That's the result I'm getting, too, and it means your files, as expected, are fine, as is your ENA connection.

Huh. So webin-cli validates your files fine when called from the command line but not when the exact same command is called by TypeLoader. 🤔

The only reason I can think of is that there may be a privilege issue somewhere in there. Because your TypeLoader data_path is in your user/documents folder and I think Windows sometimes behaves weird with programs operating there.

Can you, just as a try, do a fresh TypeLoader install (with the beta installer) and use something else as the data path for TypeLoader? So you install it in C:/Program Files (x86)/typeloader_new or something like that, but use something else instead of C:\Users\1864447\Documents\AlleleSubmissions_Typeloader as data_path. Somewhere either directly on your harddrive (C:\data\typeloader_data etc.) or on a network storage etc.. Anything outside of C:\Users, really.

Just make it a fresh install (no "updating an existing TypeLoader installation" and type in all your config stuff. Then start that new TypeLoader and create a test user there. And with that test user, create a fresh project, add one of your alleles, and try to submit it to ENA. (As it's a test user, this will go to their dev server, so no cleanup required if it goes through.)

Let me know how that turns out?

steletvinicius commented 2 years ago

...

image PRJEB48752_20211112160134_manifest_failed.txt PRJEB48752_20211112160134_flatfile_failed.txt.gz

image

image

image

Did not work.

bmschoene commented 2 years ago

This is so weird. I have no idea why the command works when called from the command line but not when called from TypeLoader.

Okay, I have a new beta-installer for you. Just use it to update your typeloader_new installation. Here it is.

(I have changed the way the call to webin-cli is implemented, and extended the logging.)

Please try it out - and if it still doesn't work, send me the logfile?

steletvinicius commented 2 years ago

I have downloaded and runned the installer updating the typeloader_new installation (last one created with only the test server).

It seems like there is a problem on some script. Check the error message on the print screen below:

image

bmschoene commented 2 years ago

Ah, sorry, that was a bug I introduced when implementing the improved logging. So hard to test for stuff that I can't reproduce.

Here's a new installer that should hopefully work.

For any errors, please always add the log file of the session. This is where I can hopefully see what's happening.

steletvinicius commented 2 years ago

I forgot to add the log files from my last attempt. Sorry.

2021/11/15 -> log files 20211115_121307.log 20211115_121227.log

-> Submission files: PRJEB48752_20211115121334_manifest.txt PRJEB48752_20211115121250_flatfile.txt.gz PRJEB48752_20211115121250_manifest.txt PRJEB48752_20211115121334_flatfile.txt.gz

======================================= 2021/11/16

New error (using already created and new project) image

-> log file: 20211116_103713.log

-> Submission files: Project 01: PRJEB48752_20211116103738_flatfile_failed.txt.gz PRJEB48752_20211116103738_manifest_failed.txt

Project 02: PRJEB48798_20211116105339_manifest_failed.txt PRJEB48798_20211116105339_flatfile_failed.txt.gz

bmschoene commented 2 years ago

Darn. Still nothing helpful in the log. Okay, I have added yet another logging step, which is the last thing I can think of at the moment: Here you go

The result will look the same on your side, but hopefully the log will tell me what's going wrong with that webin-cli call.

(FYI, I have a few days of vacation coming up, so won't be able to look at your reply before Monday. I will get back to you then.)

steletvinicius commented 2 years ago

Here are the new files generated from the last updated version:

.log

PRJEB48752_20211116130550_flatfile_failed.txt.gz PRJEB48752_20211116130550_manifest_failed.txt

PRJEB48798_20211116130540_flatfile_failed.txt.gz PRJEB48798_20211116130540_manifest_failed.txt

( Enjoy your vacation!! :-) )

bmschoene commented 2 years ago

Hah! Problem found! \o/

But before we proceed towards solving that, can you do me a favour, install another beta, test it and send me the log file? (I don't need the flatfile or manifest.) I have tweaked the logging a bit more and I'd like to see if we can catch that error without having to use the crude logging I implemented in _beta3. (The problem is that webin_cli takes the ENA password as a command line argument, in plain text. So logging the whole call, like I did in _beta3, exposes the password in the log file, which is something I really don't like to do. (That's why I deleted the log file in the previous comment.) So I'd like to find a way to log this kind of error without having to log the whole call to webin_cli. Hopefully, I have done that in _beta4. But I can't test it on my side because I can't get webin_cli to throw that kind of error.)


Anyway, here is the crux of our issue:

_CompletedProcess(args=['java', '-jar', 'C:\Program Files (x86)\typeloader_new\ENA_Webin_CLI\webin-cli-4.2.2.jar', '-context', 'sequence', '-manifest', 'C:\data\typeloader_data\test\projects\20211112_T_HLA-B_ExonNovelties\PRJEB48752_20211116130550_manifest.txt', '-userName', , '-password', , '-centerName', '"INSTITUTO NACIONAL DE CANCER JOSE ALENCAR GOMES DA SILVA"', '-inputDir', 'C:\data\typeloader_data\test\projects\20211112_T_HLA-B_ExonNovelties', '-outputDir', 'C:\data\typeloader_data\test\projects\20211112_T_HLA-BExonNovelties', '-test', '-validate'], returncode=1, stdout=b'', stderr=b"Error opening registry key 'Software\JavaSoft\Java Runtime Environment'\r\n")

So the problem is with your java installation, not TypeLoader itself.

Which java version did you install? We use openjdk version 13: https://www.codejava.net/java-se/download-and-install-jdk-13-openjdk-and-oracle-jdk

Maybe this can help? https://stackoverflow.com/questions/57244959/java-error-opening-registry-key-software-javasoft-java-runtime-environment

steletvinicius commented 2 years ago

Ok, here is what I have done:

  1. Install new beta version
  2. Try to submit to ENA again
  3. Got the error message similar to the one mentioned on the stackoverflow link you shared (I have printed the screen to share but I lost after restarting the PC)
  4. Uninstall java from my computer
  5. Installed java from https://www.java.com/pt-BR/download/manual.jsp ( I do not know how to install the one you shared... sorry)
  6. Try to submit to ENA again (again)
  7. Got the following error message (i believe the java problem is solved, is not?)

========================================================= ERROR: ENA rejected your files (validation failed):

ERROR: Could not find study "PRJEB48798". The study must be owned by the submission account used for this submission or it must be private or temporarily suppressed and referenced by accession. Note that only a single study can be referenced. Unknown study PRJEB48798 or the study cannot be referenced by your submission account. Studies must be submitted before they can be referenced in the submission. [manifest file: C:\data\typeloader_data\test\projects\20211116_T_HLA-F_BarretosNovelties\PRJEB48798_20211122165551_manifest.txt, line number: 1, field: STUDY, value: PRJEB48798] ERROR: Invalid manifest file. Please see the error report file "C:\data\typeloader_data\test\projects\20211116_T_HLA-F_BarretosNovelties.\PRJEB48798_20211122165551_manifest.txt.report".

The complete submission has been rejected.

image

And finally, here are the log files (all of them):

20211122_165539.log 20211122_155248.log 20211122_162525.log 20211122_164919.log

And the project files - I know you said not to send them, but I am doing it cause they are mentioned on the error message. ProjectFiles_20211116_T_HLA-F_BarretosNovelties.zip

bmschoene commented 2 years ago

That looks good, thank you. The log files are informative and the java issue is apparently solved. 🎉

The project you were trying to submit is one that you created several days ago (November 16th). Since this is a test account, the project was only created on the ENA test server, where things are discarded after 24 hours. So the project does no longer exist at their server, hence the submission is rejected with "Could not find this study".

If you create a new project with your test account, add a new sequence, and then submit it, I believe it should go through. And the submissions from your normal account should go through, as well (since projects on the productive ENA server are, of course, not deleted).

Let me know how it goes. 🤞

steletvinicius commented 2 years ago

image

It worked!!! :-) Thank you so much Bianca for all the patient and efforts to close this issue.

bmschoene commented 2 years ago

My pleasure! :-) I'm so glad you finally have a working TypeLoader again. Looking forward to see all those novel alleles in the database. ;-)