OLC-Bioinformatics / ConFindr

Intra-species bacterial contamination detection
https://olc-bioinformatics.github.io/ConFindr/
MIT License
22 stars 8 forks source link

Bug load fastq records #63

Open miliskato opened 3 weeks ago

miliskato commented 3 weeks ago

Hi,

When running ConFindr on a specific sample we encountered a KeyError where the key could not be found in the fastq records. However, the key is present in the fastq records.

After some digging, we could trace the cause to be the load_fastq_records method in the methods.py script. We have read names that contain :1: (referring to the lane number (https://help.basespace.illumina.com/files-used-by-basespace/fastq-files)) but also end in /1. This causes /1 to be added to the record. As a result, the key it is looking for does not match the fastq read names (it is looking for read_name/1).

Is there a reason why you first check if :1: is present in the record before checking if the record already contains /1? Can this be swapped and can you check if the read ends with /1 instead of containing it? Also, it is documented above the first condition (if ':1:' in record.id) that you change a :1: to /1 in the record id, but you just add /1. Is this a mistake in the documentation or in the code?

Current code:

if forward:
                # Change a :1: to /1 in the record.id
                if ':1:' in record.id:
                    record.id = record.id + '/1'
                # Don't worry if the record.id already has a /1
                elif '/1' in record.id:
                    pass
                # If the record.id doesn't have a read direction, add /1
                else:
                    record.id = record.id + '/1'
# Process reverse reads in a similar fashion to forward reads
else:
                if ':2:' in record.id:
                    record.id = record.id + '/2'
                elif '/2' in record.id:
                    pass
                else:
                    record.id = record.id + '/2'

Suggested code:

if forward:
                # Don't worry if the record.id already has a /1
                if record.id.endswith('/1'):
                    pass
                # Change a :1: to /1 in the record.id
                elif ':1:' in record.id:
                    record.id = record.id + '/1'
                # If the record.id doesn't have a read direction, add /1
                else:
                    record.id = record.id + '/1'
# Process reverse reads in a similar fashion to forward reads
else:
                if record.id.endswith('/2'):
                    pass
                elif ':2:' in record.id:
                    record.id = record.id + '/2'
                else:
                    record.id = record.id + '/2'

Thanks in advance for your reply!