Open jmunoz94 opened 7 months ago
Hi! You could try to build samtools yourself from sources and see whether it works or not. Bioconda downloads it from https://github.com/bioconda/bioconda-recipes/blob/76794693913ab648d37f7c4d2ddc14413dd9d206/recipes/samtools/meta.yaml#L13 And then builds it with these commands: https://github.com/bioconda/bioconda-recipes/blob/76794693913ab648d37f7c4d2ddc14413dd9d206/recipes/samtools/build.sh
99999999999999999999 is larger than 264. It shouldn't really crash (I haven't reproduced that yet), but it's also not something you should expect to work and is not very useful as a test case. Do you really have a chromosome that long? I suggest you reduce the position in your unit test to something rather more realistic.
Positions internally in samtools are signed 64-bit integers. I would not assume that positions beyond 9223372034707292159 work correctly.
Hi, thank you for your suggestions and feedback.
I tried building with the commands in the samtools recipe, but it first complains about htslib (it detects multiple installations, but "system" does not work). After explicitly setting the htslib path, it then starts complaining about other dependencies. I finally decided to try with apt
, and I was positively surprised to see that it has version 1.17 (which is not the newest, but much more recent than the one we were using).
Regarding the test, it was added in 2018 and the person who added it is not around anymore. I assume she wanted to have a very large number there. Keeping it big but smaller than 2^64 makes sense (I will make the update), but it is still really weird that it only fails with this bioconda::samtools + miniforge combination.
I also noticed that another command yields different results:
samtools mpileup --fasta-ref hg19.fa --region chr1 --positions in_file --output out_file some.bam
bioconda::samtools | apt |
---|---|
in_file is a file with the following single line:
chr1 209806391 209806501
At first I thought this was because of max-depth*, but after explicitly setting it to 8000 results are still different. First and last lines in the output file are different, in the screenshots above I show how the first lines (1 to 15) differ.
I am not familiar with these formats, neither with the science behind them, and I suppose that not being able to provide the bam I am using does not help, but any ideas of what could cause that difference?
I also found the error that we were getting with samtools 1.9 (the older version we used before moving from miniconda to miniforge) was:
samtools tview: cannot read index for "some_file.bam"
Footnotes:
*I thought so because when running the command with the version that apt installs, it prints:
[mpileup] 1 samples in 1 input files
<mpileup> Set max per-file depth to 8000
While bioconda::samtools + miniforge only prints the first line.
The malloc(): memory corruption
crash turns out to be a memory corruption buglet in tview
when given enormous positions, introduced in samtools 1.10. As a memory corruption bug, it presents differently on different platforms and with different compilation options, which is why you didn't see the exact same behaviour from your brew build. This will be fixed by samtools/samtools#2032.
The workaround for your test case is the same as I suggested above. Samtools does not read in these 99999…99999 values correctly, as it only represents values up to 263-1 and does not currently reject extreme values beyond that. So I would recommend you adjust your test case to use a position within that range — moreover with 13 or fewer digits to avoid the crash you've discovered with existing samtools versions.
Me, I don't have any particular advice about your differing mpileup output, other than that such things are often due to BAQ recalculation and/or overlapping read handling. So you might try experimenting with the -B
and -x
options.
Hi!
We use samtools tview to generate some output we then share via an internal API endpoint. Historically, we used samtools 1.9 + miniconda and everything worked as expected.
However, we had to move from miniconda to miniforge because of the update to miniconda's terms and conditions. Making the switch seemed to be relatively straightforward, till we noticed the endpoint was erroring for all requests. Right now I do not recall the exact error we were getting with samtools 1.9, but updating the library appeared to fix it. We moved to samtools 1.19 (which is the latest version as of today).
The endpoint works for most requests I have tried, but a unit test has been failing consistently. This unit test runs the command below:
Which fails with
I set the verbosity level to 8 (
--verbosity=8
), but no extra information is given:I also noticed that the memory allocation error is not raised if I remove one 9 from the command above. So I wonder if it is either running out of memory or if the bam has issues around that position.
This issue does not happen at all (not even with all the 9s/bigger numbers) if I run samtools installed from brew.
I see that the output of
samtools --version
is different, so I was wondering if some flag set when installing samtools from bioconda using miniforge is to blame.Bioconda + Miniforge:
Brew:
The problematic samtools installation is on an Ubuntu container:
uname -m
returns "x86_64"gcc --version
returns:The container has not changed other than switching from miniconda to miniforge (using
Miniforge3-4.10.0-0-Linux-x86_64
, newer versions caused dependency issues for other libraries).I am opening an issue because I could not get much in terms of error messages and I could not find anything helpful googling. I am not very familiar with ~scientific stuff (including but not limited to bam files), but the bam in question opens in IGV (2.16.0) without issues (so I assume the file is OK?). I am not sharing the file because it is private, hopefully the other information is enough to determine what the issue is.
Thank you!