BDI-pathogens / phyloscanner

Phylogenetics between and within hosts at once, all along the genome.
GNU General Public License v3.0
44 stars 14 forks source link

Problem when using explore window width option #26

Closed MarcNiebel closed 6 years ago

MarcNiebel commented 6 years ago

Dear Chris,

I am having a problem when using the explore-window-widths option. This is my command:

../phyloscanner/phyloscanner_make_trees.py Transmission_couples.csv --merge-paired-reads -MTA 1 --explore-window-widths 1,210 --explore-window-width-file alltrans_210.csv

For smaller windows (170-200) it works fine but the error reads: Traceback (most recent call last): File "../phyloscanner/phyloscanner_make_trees.py", line 1351, in RightWindowEdgeForFetch): File "pysam/libcalignmentfile.pyx", line 1065, in pysam.libcalignmentfile.AlignmentFile.fetch (pysam/libcalignmentfile.c:15015) File "pysam/libchtslib.pyx", line 678, in pysam.libchtslib.HTSFile.parse_region (pysam/libchtslib.c:11920) ValueError: invalid coordinates: start (9239) > stop (9190)

Any help would be much appreciated.

Kind Regards,

Marc

ChrisHIV commented 6 years ago

Hi Marc, I think something has gone wrong translating the window coordinates between the different references, such that a window start is after a window end. I don't know how that would happen and will need to investigate; unfortunately I'm in the middle of a complicated change to the code that will take several days. If you add just before line 1350: if LeftWindowEdgeForFetch >= RightWindowEdgeForFetch: print("Warning: dodgy coordinate window, skipping for now") continue I think that should at least let you work around it for now. Can you share the references to which your reads were mapped (the files in the second column of Transmission_couples.csv)? Their sequence names are irrelevant (so rename if sensitive) but the sequences themselves should allow me to reproduce the error.

ChrisHIV commented 6 years ago

Oops the comment above didn't preserve the identation of the three lines I suggested you add. The "if" is indented by 4 spaces, the following lines by 6 spaces (should be clear from the local indentation where you add it).

MarcNiebel commented 6 years ago

Hi Chris, Thanks for the swift reply. It was working fine a week ago but then it stopped working. Have attached my references used for mapping and also the one used for --alignment-of-other-refs and --pairwise-align-to (AF009606). Will try out your workaround.

1a_refs.txt

ChrisHIV commented 6 years ago

Hi Marc, sorry for the delay. I haven't been able to reproduce the error: I made four bam files by mapping test reads to those four references, and then running the same command as you on those four bams, it works fine. I'm sure it's a problem with translating coordinates between bams based on the reference alignment however; it could be due to mafft aligning them differently for you and me, or perhaps you have more than four bams (with some of them using the same reference) and multiple identical references changes the alignment somehow. Can you share the RefsAln.fasta file and all of the output before the error please? (By email if you prefer; renaming sequences if you prefer.)