Open C-YONG opened 3 months ago
Hi, I encountered the same issue using version 1.7.0 with mapped PacBio ccs reads. I was able to run a previous version of NanoVar (1.3.8) on the same data without any problem.
I use the following command, which throws the 'feature beyond the length' message at the VCF generation step:
nanovar $BAM $GENOME_NV $CALLS_DIR/nanovar/$SAMPLE -x pacbio-ccs -t $CPU -f $EXCL_BED --debug
The last files outputted are cluster.tsv
and parse2.tsv
(empty).
Installation was done with conda
as described in the README.
Thank you for any guidance or advice.
Hi @C-YONG @LaurieLecomte, thanks for reporting this. I'll need some time to check on this and will revert in the next few days. Thank you for your patience.
Hi @C-YONG and @LaurieLecomte, please try the v1.7.1-dev branch and let me know if it fixes the issue.
You can clone the branch by:
git clone -b v1.7.1-dev https://github.com/cytham/nanovar.git
Hi @cytham, thank you very much for the quick response and the update.
Unfortunately, I got the exact same error and outputs using the v1.7.1-dev version.
I initially suspected that the bug might come from Inter-Ins(2) and InterTx, but after investigating a bit on my end, it looks like some calls of Inv(2) and Intra-Ins(2) also give the same error.
I made a custom version of the make_bed()
function from v1.7.0 to output an intermediary bed file featuring SV type and the original id~chrom:start-end
string from parse1.tsv
, and below I pasted a few of the lines representing sites that threw the beyond chromosome length error. For Intra-Ins(2) and Inv(2) sites, in the id~chrom:start-end
strings, the start position is greater than the end position:
CM055694.1 56110901 56110902 EC435 Inter-Ins(2) EC435~CM055694.1:56110902~CM055721.1:17888839
CM055703.1 67666256 67666257 B895F Intra-Ins(2) B895F~CM055703.1:67666257-49693072
CM055703.1 67666256 67666257 B895F Intra-Ins(2) B895F~CM055703.1:67666257-49693072
CM055703.1 67666256 67666257 D6D6D Intra-Ins(2) D6D6D~CM055703.1:67666257-49693072
CM055703.1 67666256 67666257 D6D6D Intra-Ins(2) D6D6D~CM055703.1:67666257-49693072
CM055704.1 52471483 52471484 E5B58 Inv(2) E5B58~CM055704.1:52471484-3246722
CM055705.1 46612017 46612018 D22C1 InterTx D22C1~CM055705.1:46612018~CM055722.1:15467696
CM055707.1 43435500 43435501 E9F58 Intra-Ins(2) E9F58~CM055707.1:43435501-24087228
CM055708.1 43435500 43435501 C66E2 InterTx C66E2~CM055708.1:43435501~CM055707.1:24087228
CM055711.1 62885703 62885704 71010 Inv(2) 71010~CM055711.1:62885704-42442184
CM055711.1 81492618 81492619 F0477 Inv(2) F0477~CM055711.1:81492619-4879966
These start positions are also larger than the corresponding chromosome length, hence the error encountered.
I got the same result with make_bed()
from v1.7.1, which gave the same output without Inter-Ins and InterTx (as expected).
Hope this helps.
Hi @LaurieLecomte, thanks for looking into this. Indeed, Intra-Ins and Inv could be causing the problem, my bad for missing that out. I have removed them in the latest v1.7.1-dev branch. Could you please try again?
The fact that start positions are greater than end positions in some cases shouldn't affect the BED file creation. So I am thinking that it could be another bug somewhere for Intra-Ins, Inv, etc. But thanks a lot for looking into this, you have been of great help.
Also, could you let me know the chromosome lengths of those you showed? (e.g. CM055703.1, CM055704.1, CM055705.1, CM055707.1, CM055708.1, CM055711.1)
Hi @cytham, thanks for the update.
I tried the latest branch and everything ran without any issue, yielding complete VCFs and no errors.
Here are the chromosome lengths for the few problematic sites I shared in my previous comment (from the ASM2944872v1 assembly):
CM055694.1 60416927
CM055703.1 52113258
CM055704.1 42625803
CM055705.1 50983633
CM055707.1 43302116
CM055708.1 48152825
CM055711.1 46201713
Thanks @LaurieLecomte. I'm trying to figure out the root cause for this. Is it possible to share the debug files with me? Specifically, subdata.tsv, detect.tsv, and parse1.tsv? Thanks
Hi @cytham, thank you for looking into this. I uploaded the files from a run with version 1.7.0 to Dropbox along with corresponding log files. Let me know if you also need input files (mapped reads and/or reference genome).
Hello, unfortunately, I encountered a "getfasta error" when using both versions 1.7.1 and 1.7.0 of nanovar with HG002 HiFi data. Do you have any solutions for this issue?
nanovar -x pacbio-ccs -t 30 -f ~/01.Reference/Human_GRCh37/nanovar_gap.bed ./HG002.HIFI.sort.bam ~/01.Reference/Human_GRCh37/02.ngmlr/GCF_000001405.25_GRCh37.p13_genomic.fa Nanovar_wd
hi @Flooooooooooooower, can you provide some logs please?
hi @Flooooooooooooower, can you provide some logs please?
Hello, thank you for your prompt response. Below is my log file.
nanovar --debug -l 50 -x pacbio-ccs -t 30 -f ~/01.Reference/Human_GRCh37/nanovar_gap.bed ./HG002.HIFI.sort.bam ~/01.Reference/Human_GRCh37/02.ngmlr/GCF_000001405.25_GRCh37.p13_genomic.fa Nanovar_wd
[09/08/2024 13:12:21] - NanoVar started
[09/08/2024 13:12:21] - Checking integrity of input files - Pass
[09/08/2024 13:13:03] - Analyzing read alignments and detecting SVs - Done
[09/08/2024 13:48:55] - Clustering SV breakends - Done
[09/08/2024 14:42:37] - Correcting DUP and detecting TE - Done
[09/08/2024 15:26:23] - Generating VCF files and report - Traceback (most recent call last):
File "/home/xxx/miniconda3/envs/myenv/bin/nanovar", line 635, in <module>
main()
File "/home/xxx/miniconda3/envs/myenv/bin/nanovar", line 486, in main
run.vcf_report() #
^^^^^^^^^^^^^^^^
File "/home/xxx/miniconda3/envs/myenv/lib/python3.11/site-packages/nanovar/nv_characterize.py", line 205, in vcf_report
alt_seq = get_alt_seq(self.dir, self.out_nn, self.refpath)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xxx/miniconda3/envs/myenv/lib/python3.11/site-packages/nanovar/nv_alt_seq.py", line 46, in get_alt_seq
fasta = bed.sequence(fi=ref_path, nameOnly=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xxx/miniconda3/envs/myenv/lib/python3.11/site-packages/pybedtools/bedtool.py", line 907, in decorated
result = method(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xxx/miniconda3/envs/myenv/lib/python3.11/site-packages/pybedtools/bedtool.py", line 388, in wrapped
stream = call_bedtools(
^^^^^^^^^^^^^^
File "/home/xxx/miniconda3/envs/myenv/lib/python3.11/site-packages/pybedtools/helpers.py", line 456, in call_bedtools
raise BEDToolsError(subprocess.list2cmdline(cmds), stderr)
pybedtools.helpers.BEDToolsError:
Command was:
bedtools getfasta -nameOnly -fo /tmp/pybedtools.xjnohstx.tmp -fi /home/xxx/01.Reference/Human_GRCh37/02.ngmlr/GCF_000001405.25_GRCh37.p13_genomic.fa -bed /tmp/pybedtools.l0vhhi9x.tmp
Error message was:
[NanoVar-090824-1312.log](https://github.com/user-attachments/files/16557678/NanoVar-090824-1312.log)
Feature (19:121485432-121485433) beyond the length of 19 size (59128983 bp). Skipping.
@Flooooooooooooower I see your logs are from v1.7.0, can I please confirm that you get the same error for v1.7.1? thanks
@Flooooooooooooower I see your logs are from v1.7.0, can I please confirm that you get the same error for v1.7.1? thanks
Hi,Below is the log file of 1.7.1.
[11/08/2024 16:27:02] - NanoVar-v1.7.1 started
[11/08/2024 16:27:02] - Checking integrity of input files - Pass
[11/08/2024 16:27:23] - Analyzing read alignments and detecting SVs - Done
[11/08/2024 17:49:41] - Clustering SV breakends - Done
[11/08/2024 18:35:33] - Correcting DUP and detecting TE - Done
[11/08/2024 18:55:58] - Generating VCF files and report - Traceback (most recent call last):
File "/opt2/conda/bin/nanovar", line 638, in
bedtools getfasta -nameOnly -fo /tmp/pybedtools.1886eg8o.tmp -fi /home/xxx/01.Reference/Human_GRCh37/02.ngmlr/GCF_000001405.25_GRCh37.p13_genomic.fa -bed /tmp/pybedtools.74ascn07.tmp
Error message was: Feature (NT_167214.1:79186948-79186949) beyond the length of NT_167214.1 size (161802 bp). Skipping. Feature (NT_167214.1:31149863-31149864) beyond the length of NT_167214.1 size (161802 bp). Skipping. Feature (NT_167214.1:76807186-76807187) beyond the length of NT_167214.1 size (161802 bp). Skipping. Feature (NT_167214.1:76807695-76807696) beyond the length of NT_167214.1 size (161802 bp). Skipping. Feature (NT_167214.1:127650692-127650693) beyond the length of NT_167214.1 size (161802 bp). Skipping.
I also got the same error while using v1.7.0.
@cytham Is this issue resolved?
Hi @jiadong324, do you still get the error with the v1.7.1-dev branch?
I just finished the test with v1.7.1-dev. No more errors! When will you release the v1.7.1.
NanoVar v1.8.1 is now available from PyPI and Conda, thanks for your understanding
Hello, software developer. I have three ONT datasets in total. The first two ran normally, but the third one encountered the following error. What could be the reason for this?
[24/07/2024 15:27:11] - NanoVar started [24/07/2024 15:27:11] - Checking integrity of input files - Pass [24/07/2024 15:27:12] - Analyzing read alignments and detecting SVs - Done [24/07/2024 15:52:39] - Clustering SV breakends - Done [24/07/2024 16:08:29] - Correcting DUP and detecting TE - Done [24/07/2024 16:13:00] - Generating VCF files and report - Traceback (most recent call last): File "/home/chiyong/miniconda3/envs/nanovar/bin/nanovar", line 635, in
main()
File "/home/chiyong/miniconda3/envs/nanovar/bin/nanovar", line 486, in main
run.vcf_report() #
^^^^^^^^^^^^^^^^
File "/home/chiyong/miniconda3/envs/nanovar/lib/python3.11/site-packages/nanovar/nv_characterize.py", line 205, in vcf_report
alt_seq = get_alt_seq(self.dir, self.out_nn, self.refpath)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chiyong/miniconda3/envs/nanovar/lib/python3.11/site-packages/nanovar/nv_alt_seq.py", line 46, in get_alt_seq
fasta = bed.sequence(fi=ref_path, nameOnly=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chiyong/miniconda3/envs/nanovar/lib/python3.11/site-packages/pybedtools/bedtool.py", line 907, in decorated
result = method(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chiyong/miniconda3/envs/nanovar/lib/python3.11/site-packages/pybedtools/bedtool.py", line 388, in wrapped
stream = call_bedtools(
^^^^^^^^^^^^^^
File "/home/chiyong/miniconda3/envs/nanovar/lib/python3.11/site-packages/pybedtools/helpers.py", line 456, in call_bedtools
raise BEDToolsError(subprocess.list2cmdline(cmds), stderr)
pybedtools.helpers.BEDToolsError:
Command was:
Error message was: Feature (LR584250.1:14620732-14620733) beyond the length of LR584250.1 size (12913240 bp). Skipping.
Writing to /tmp/bcftools.VhKjGh