BDI-pathogens / phyloscanner

Phylogenetics between and within hosts at once, all along the genome.
GNU General Public License v3.0
47 stars 14 forks source link

TypeError: object of type 'NoneType' has no len() #79

Open MTDouglas opened 1 month ago

MTDouglas commented 1 month ago

Hello,

I'm getting the following error as shown below when running on a series of bam files. Bam files we're generated by mapping to a consensus sequence generated from an initial alignment

Traceback (most recent call last): File "/phyloscanner/phyloscanner_make_trees.py", line 1571, in <module> read.query_qualities = [106 for base in range(len(read.query_sequence))] TypeError: object of type 'NoneType' has no len()

Code used to run is below:

`

    /phyloscanner/phyloscanner_make_trees.py \

    $csv \

    --auto-window-params 100,0,100,1000\

    --alignment-of-other-refs $alignment_of_other_refs \

    --pairwise-align-to NC_004102_genotype_1a \

    --merge-paired-reads \

    --discard-improper-pairs \

    --inspect-disagreeing-overlaps \

    --time \

    --output-dir make_trees_results > make_trees.log`
ChrisHIV commented 1 month ago

pysam is failing to extract the genetic sequence of a read in your bam file, which I would guess is some problem with the bam file. You can try editing phyloscanner_make_trees.py, adding immediately above the line where the error occurs (1571):

        print(BamFileName)
        print(read)

(with the same amount of indentation as that line ). The first of those two prints should tell you in which of your bam files this is happening. The second print may print something to help you identify which read in the bam file is the problem (or it may not print anything, if there is a corrupted read that pysam cannot understand at all). If it prints the name of the read, you could also see whether basic samtools understands that read (pysam is a Python wrapper for samtools) by running samtools view on that bam file from the command line, and piping the output to grep for the name of the problematic read.

MTDouglas commented 1 month ago

I added the two lines you suggested alongside some text to verify that the lines are being printed. However, this does not seem to generate anything either, as we would have at least seen the basic text in the print statements. The error text is below and we now see it is in line 1573 instead of 1571 since we added the two print statements above that line of code.

image

Traceback (most recent call last): File "/phyloscanner/phyloscanner_make_trees.py", line 1573, in <module> read.query_qualities = [106 for base in range(len(read.query_sequence))] TypeError: object of type 'NoneType' has no len()

MTDouglas commented 1 month ago

Also, just to clarify, the fasta inputted using the --alignment-of-other-refs parameter should be an MSA correct? I had tried just using a regular multi fasta and was getting an error that not all the sequences are the same lenght.

ChrisHIV commented 1 month ago

this does not seem to generate anything either, as we would have at least seen the basic text in the print statements.

Not sure what you mean by "we would have" here - we would have if what? Should that have said "we would have expected to have at least seen..."? Are you sure nothing is being printed by those two lines? From the command above it looks like you're redirecting stdout but not stderr, which means the output of those two prints would you to your log file whereas the error message would go to the terminal.

If those two prints really aren't printing anything (which would be very strange, since execution of line 1573 would require execution of the two lines before assuming they're equally indented), you could try splitting your csv input list of bams into a separate file for each line (i.e. each bam file) and running that same phyloscanner command separately on each one, to identify which bam file is / files are causing the problem.

Once you've identified the problematic bam file, and perhaps the specific read therein, see my original suggestion about samtools view.

And yes, --alignment-of-other-refs needs an alignment, not unaligned sequences.