enasequence / webin-cli

Webin command line submission program.
Apache License 2.0
30 stars 5 forks source link

Error "Invalid file content" during file processing #83

Open erosix opened 2 years ago

erosix commented 2 years ago

I am getting this error while trying to upload my fast files with the Webin uploader application. The upload works fine but a few days after I get this error below by email for some files. Some FASTQ files of the same batch are being processed without error and many others get this error. However this error is not present on any of the documentation pages and I cannot understand what is means.

FILE_NAME | ERROR | MD5 | FILE_SIZE | DATE | RUN_ID/ANALYSIS_ID XXXX.fastq.gz | Invalid file content | 996bd54b2cc131bff70bbb3a6eb40ce6 | 270943 | 16-OCT-22 | ERR10361498

Below is an example of such fastq files (read 1):

@MIG61899R1UMI:CAGGCGCTCTAAG:9/1
GTAGCCTGTGCCCTCACCCACTTGGTTCTCGGGCCAGAGTTGGCACCATCTGGGGCAGCCAGGGGCCCTGCGAGGCTGCTCCAAGTTCTGCACCATTTCCCAACCCGGGGGACAGAACCCTGACCCA
+
I1%ICIIIIIIIIIIIIIIIICIIIIIIIIIIIIIIIIIIIIIIIIIIIICIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIICIIIIIIIIIIIIIIIIIIIIIIIII
@MIG49092R1UMI:AAATCGGCAGTCT:7/1
GTAGAGCCCTGACCACTCTGGGCTGAAGGCCACAGAAGTGGGGTTTTGCTTCCTGGGCTGGTGGCCACAGAAGTGGGGTTTTGCTTCCTGGGCTGGTGGCCACAGAACCCTGACCCACACACTTGAG
+
A**IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIAIIIIIIIIIII

Any idea what may be the cause of such error? ENA support is not replying. Thank you in advance.

HansCastorp77 commented 1 year ago

I have also encountered this problem and the problematic *fastq.gz files look correct to me when opened - has anyone found any more information about this?

Best wishes

stegiopast commented 1 year ago

Hey there,

I am facing similar problems right now. Did someone solve the problem in the meantime ?

Best regards

HansCastorp77 commented 1 year ago

I received the following response from the ENA team earlier this week: "It looks like the read names in your erroneous files are more than 256 characters. Please reduce the length to be less than 256 characters and re-upload your files following - https://ena-docs.readthedocs.io/en/latest/ submit/fileprep/upload.html"

Best

NickJD commented 1 year ago

Any update on this?

stegiopast commented 1 year ago

Hi Nick,

I finally updated the raw fast5 files on this repository, since I had problems uploading my files in fastq format even with short filenames. I there was a problem since the files originate from an ONT experiment.

Kind regards, Stefan Pastore

Am Di., 25. Juli 2023 um 14:30 Uhr schrieb Nick Dimonaco < @.***>:

Any update on this?

— Reply to this email directly, view it on GitHub https://github.com/enasequence/webin-cli/issues/83#issuecomment-1649746756, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANNHK4UJBFTIEIQ3EC7BND3XR635TANCNFSM6AAAAAARNZ5UPY . You are receiving this because you commented.Message ID: @.***>

HansCastorp77 commented 1 year ago

I shortened the read names and was able to upload the fastq files successfully. These were Oxford Nanopore technologies (ONT) derived fastq files so they had a load of additional info about the run included in the read names e.g. runid=xxxx read=xxxx chr=xxx etc. I believe I removed this using sed: sed 's/runid.*//' file.fastq > newfile.fastq

Very best, David

stegiopast commented 1 year ago

Hi David,

Many thanks. I gave up after the successful upload of the fast5 data. But I’m very happy that it worked out finally.

Best wishes, Stefan Pastore

HansCastorp77 @.***> schrieb am Di. 25. Juli 2023 um 16:35:

I shortened the read names and was able to upload the fastq files successfully. These were Oxford Nanopore technologies (ONT) derived fastq files so they had a load of additional info about the run included in the read names e.g. Runid=xxxx read=xxxx chr=xxx etc. I believe I removed this using sed: sed 's/runid.*//' file.fastq > newfile.fastq

Very best, David

— Reply to this email directly, view it on GitHub https://github.com/enasequence/webin-cli/issues/83#issuecomment-1649966178, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANNHK4XCK7GDNC2JMTFLNBDXR7KTHANCNFSM6AAAAAARNZ5UPY . You are receiving this because you commented.Message ID: @.***>

komalantaliya commented 11 months ago

Hello, I also found the same error now "invalid File Content" even I also tried the solution given by you all because it's Oxford Nanopore technologies (ONT) derived fastq files. i shortened the read name but it doesn't work the same error is shown. Don't know what is the cause of such an error. Does anyone have any idea??

jkimsis commented 10 months ago

I'm having the same issue, the reads are from Ion Torrent Proton and the names are short to begin with, but I'm still getting a "Invalid file content" after uploading the files.

JessieChen7 commented 7 months ago

Hello, I've also encountered the "Invalid file content" error. However, I've verified that the length of the read names is not the issue, as other files with similar read name lengths have uploaded successfully. Could anyone provide guidance on resolving this issue? Any assistance would be greatly appreciated. Thank you.

cocathail commented 7 months ago

The "Invalid file content" error, can be many errors.

Please ensure all fastq lines adhere to our format guides: https://ena-docs.readthedocs.io/en/latest/submit/fileprep/reads.html#fastq-format

If you use Webin-CLI (as opposed to interactive submission via the Webin web interface), you may get more specfic errors to help guide your troubleshooting.

Regards, Colman