foerstner-lab / READemption

A pipeline for the computational evaluation of RNA-Seq data
https://reademption.readthedocs.io
Other
36 stars 19 forks source link

align error "Not a gzipped file" #49

Closed stefaniamagg closed 1 year ago

stefaniamagg commented 1 year ago

I get an error when using align, do you have any idea why this shows up?

The command: $ reademption align --project_path /data/msb/PEOPLE/stefania/bfr/bac_concentrations/15_reademption/bac_concentrations --processes 8 --paired_end --fastq

The error:

concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/magnusdo/.conda/envs/reademption/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/home/magnusdo/.conda/envs/reademption/lib/python3.9/site-packages/reademptionlib/readprocessor.py", line 43, in process_paired_end
    self._process_paired_end(
  File "/home/magnusdo/.conda/envs/reademption/lib/python3.9/site-packages/reademptionlib/readprocessor.py", line 134, in _process_paired_end
    for fasta_entry_p1, fasta_entry_p2 in zip(
  File "/home/magnusdo/.conda/envs/reademption/lib/python3.9/site-packages/reademptionlib/readprocessor.py", line 121, in _parse_sequences
    for seq_record in SeqIO.parse(input_fh, "fastq"):
  File "/home/magnusdo/.conda/envs/reademption/lib/python3.9/site-packages/Bio/SeqIO/Interfaces.py", line 72, in __next__
    return next(self.records)
  File "/home/magnusdo/.conda/envs/reademption/lib/python3.9/site-packages/Bio/SeqIO/QualityIO.py", line 1123, in iterate
    for title_line, seq_string, quality_string in FastqGeneralIterator(handle):
  File "/home/magnusdo/.conda/envs/reademption/lib/python3.9/site-packages/Bio/SeqIO/QualityIO.py", line 962, in FastqGeneralIterator
    for line in handle:
  File "/home/magnusdo/.conda/envs/reademption/lib/python3.9/gzip.py", line 313, in read1
    return self._buffer.read1(size)
  File "/home/magnusdo/.conda/envs/reademption/lib/python3.9/_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "/home/magnusdo/.conda/envs/reademption/lib/python3.9/gzip.py", line 487, in read
    if not self._read_gzip_header():
  File "/home/magnusdo/.conda/envs/reademption/lib/python3.9/gzip.py", line 435, in _read_gzip_header
    raise BadGzipFile('Not a gzipped file (%r)' % magic)
gzip.BadGzipFile: Not a gzipped file (b'\xf1\xf2')
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/magnusdo/.conda/envs/reademption/bin/reademption", line 715, in <module>
    main()
  File "/home/magnusdo/.conda/envs/reademption/bin/reademption", line 22, in main
    args.func(controller)
  File "/home/magnusdo/.conda/envs/reademption/bin/reademption", line 687, in align_reads
    controller.align_reads()
  File "/home/magnusdo/.conda/envs/reademption/lib/python3.9/site-packages/reademptionlib/controller.py", line 178, in align_reads
    self._prepare_reads_paired_end()
  File "/home/magnusdo/.conda/envs/reademption/lib/python3.9/site-packages/reademptionlib/controller.py", line 438, in _prepare_reads_paired_end
    self._evaluet_job_and_generate_stat_file(read_files_and_jobs)
  File "/home/magnusdo/.conda/envs/reademption/lib/python3.9/site-packages/reademptionlib/controller.py", line 443, in _evaluet_job_and_generate_stat_file
    self._check_job_completeness(read_files_and_jobs.values())
  File "/home/magnusdo/.conda/envs/reademption/lib/python3.9/site-packages/reademptionlib/controller.py", line 801, in _check_job_completeness
    raise (job.exception())
gzip.BadGzipFile: Not a gzipped file (b'\xf1\xf2')

The output/align/processed_reads directory contains the processed reads, all other output folders are empty. Example of the processed reads folder:

$ ls -1 output/align/processed_reads | head -4
-rw-rw-r--+ 1 magnusdo umb 141255653 Mar 15 15:10 L1436_0_1_p1_processed.fa.gz
-rw-rw-r--+ 1 magnusdo umb 141535120 Mar 15 15:10 L1436_0_1_p2_processed.fa.gz
-rw-rw-r--+ 1 magnusdo umb 142411804 Mar 15 14:39 L1436_0.25_1_p1_processed.fa.gz
-rw-rw-r--+ 1 magnusdo umb 142598189 Mar 15 14:39 L1436_0.25_1_p2_processed.fa.gz

Versions:

$ cat version_log.txt
READemption version: 2.0.3
Python version: 3.9.15 | packaged by conda-forge | (main, Nov 22 2022, 08:45:29)  [GCC 10.4.0]
Biopython version: 1.80
pysam version: 0.20.0
matplotlib version: 3.6.2
pandas version: 1.5.2
stefaniamagg commented 1 year ago

Looks like the issue was somehow solved with reinstalling segemehl as mentioned in a solution for a different issue (https://github.com/foerstner-lab/READemption/issues/28#issuecomment-646732733). Closing.

konrad commented 1 year ago

Great that this solved the issue, @stefaniamagg.