SushiLab / mTAGs

GNU General Public License v3.0
6 stars 2 forks source link

Error message: Whitespace is not allowed in the sequence. #3

Closed mokrobial closed 5 days ago

mokrobial commented 6 months ago

Steps:

  1. install mTags via conda
  2. download mTags database
  3. run profile

Error message: 2024-02-24 17:35:26,237 INFO: Processed reads: 36000000 Traceback (most recent call last): File "/Applications/miniconda3/envs/mtags/bin/mtags", line 8, in sys.exit(main()) File "/Applications/miniconda3/envs/mtags/lib/python3.7/site-packages/mTAGs/mtags.py", line 1284, in main execute_mtags_profile(sys.argv[2:]) File "/Applications/miniconda3/envs/mtags/lib/python3.7/site-packages/mTAGs/mtags.py", line 1219, in execute_mtags_profile ssu_files = _mtags_extract_grouped(input_seqfiles_r1, input_seqfiles_r2, input_seqfiles_s, output_folder, threads) File "/Applications/miniconda3/envs/mtags/lib/python3.7/site-packages/mTAGs/mtags.py", line 536, in _mtags_extract_grouped (reads_s, ssu_files_s, lsu_file_s) = mtags_extract(pathlib.Path(input_seqfile_s), output_folder, readnames, threads=threads) File "/Applications/miniconda3/envs/mtags/lib/python3.7/site-packages/mTAGs/mtags.py", line 375, in mtags_extract for number_of_sequences, fasta in enumerate(stream_fa(input_seq_file), 1): File "/Applications/miniconda3/envs/mtags/lib/python3.7/site-packages/mTAGs/mtags.py", line 162, in stream_fa for header, sequence, qual in Bio.SeqIO.QualityIO.FastqGeneralIterator(handle): File "/Applications/miniconda3/envs/mtags/lib/python3.7/site-packages/Bio/SeqIO/QualityIO.py", line 955, in FastqGeneralIterator raise ValueError("Whitespace is not allowed in the sequence.") ValueError: Whitespace is not allowed in the sequence.

This is the exact command I used:
% mtags profile -s /Volumes/CanalData2021/METAGENOME.fastq.gz -t 16 -o /Volumes/CanalData2021/test-run-one -n Metagenome_1_Mcd

Note: 2 duplicate fasta files, F and R were generated in the output folder Note: Metagenomic filtered raw fastq was downloaded from JGI IMG

hjruscheweyh commented 5 months ago

Hi @mokrobial

We're using biopython to read sequence files. And biopython seems to complain about your input reads having a space in the sequence (not the header). This indicates that the input file is somewhat corrupted. I suggest you to redownload the files from JGI.

Note 2: Hmmer requires uncompressed fasta files with forward and reverse orientation which can create huge temporary files which should be deleted if mTAGs finished successfully.

Best Hans