Open minw2828 opened 1 week ago
I guess you may first want to check if the alignment bam and the genome fasta you provided for Straglr both used the same chromosome name convention - without the "chr" prefix.
Can you show me the full command? and which version you were using?
And if you specify --tmpdir
to a specific directory and run with --debug
, we can locate the "malformed" line in the BED file based on the error message.
Hello @readmanchiu,
Thank you for your quick response.
I split the genome into different chunks that were named ~{region_bed}
, so straglr
could process them concurrently.
The command that I ran was:
python /usr/local/bin/straglr.py \
--regions ~{region_bed} \
--min_ins_size 3 \
--nprocs ~{threads} \
--tmpdir ~{region_name + "_" + pname + "_straglr_tmp"} \
~{bam} ~{ref_fasta} ~{region_name + "_" + pname + "_straglr"}
The same command was passed through five individuals. Of those, straglr
ran through two individuals successfully, but the remaining three individuals hit the same error:
The first individual:
Error: malformed BED entry at line 59197. Start Coordinate detected that is < 0. Exiting.
The second individual:
Error: malformed BED entry at line 9899. Start Coordinate detected that is < 0. Exiting.
The third individual:
Error: malformed BED entry at line 58727. Start Coordinate detected that is < 0. Exiting.
Hence, the error was not caused by different chromosome name conventions.
I am thinking of two possible causes:
Would reason 2 be possible?
I am keen to hear your thoughts on this.
Many thanks, Min
I was wondering what are the respective lines of the different bed files?
--min_ins_size
of 3 is a bit too much. Just a reminder that the unit for --min_ins_size
is bp, not copy number. I think some insertions are picked up near the end of chromsomes so negative coordinates are generated when flank sizes are taken into account.
I usually used 100 for --min_ins_size
as ONT reads can be quite noisy.
Also I usually skip centromeres or long repeat/segdups (which can be curated from UCSC annotation tracks) in genome scans by passing the coordinates to --exclude
Hello,
Thank you for developing the tool.
I can see my issue is similar to #20, but I don't have patch sequences in my reference genome.
Could you advise what other reason might have caused this error please?
My error message:
My reference genome only has chromosomes 1 to 22, X, Y and M.
Many thanks, Min