Closed mei2000 closed 5 years ago
Hi Mei,
Thanks for the report, hopwfully we will be able to help you. Can you let us know
1) How much memory is available
2) The first 20 and last 20 lines of the logfile
3) the output of samtools quickcheck input.bam
4) the output of samtools view input.bam | head
5) How big your BAM file is.
Cheers,
Hi Ian,
Here is my answer to your list question as follow: 1) I ran this bam file on our big server machine which has 1GB RAM with 96 cores, and I am only one use this machine now. 2) first 20 lines of logfile:
2) last 20 lines of logfile:
[geru1@rogue scripts]$ ^C [geru1@rogue scripts]$ tail -20 group_nodedup_log.txt 2017-10-11 17:24:50,860 INFO Written out 2780000 reads 2017-10-11 17:24:50,861 INFO Written out 2790000 reads 2017-10-11 17:24:50,861 INFO Written out 2800000 reads 2017-10-11 17:24:50,861 INFO Written out 2810000 reads 2017-10-11 17:24:50,861 INFO Written out 2820000 reads 2017-10-11 17:24:51,042 INFO Written out 2830000 reads 2017-10-11 17:24:52,048 INFO Written out 2840000 reads 2017-10-11 17:24:53,025 INFO Written out 2850000 reads 2017-10-11 17:24:54,201 INFO Written out 2860000 reads 2017-10-11 17:24:55,400 INFO Written out 2870000 reads 2017-10-11 17:24:56,606 INFO Written out 2880000 reads 2017-10-11 17:24:57,704 INFO Written out 2890000 reads 2017-10-11 17:24:58,636 INFO Written out 2900000 reads 2017-10-11 17:24:59,605 INFO Written out 2910000 reads 2017-10-11 17:25:00,558 INFO Written out 2920000 reads 2017-10-11 17:25:01,529 INFO Written out 2930000 reads 2017-10-11 17:25:02,729 INFO Written out 2940000 reads 2017-10-11 17:25:03,302 INFO Written out 2950000 reads 2017-10-11 17:25:04,129 INFO Written out 2960000 reads 2017-10-11 17:25:04,696 INFO Written out 2970000 reads
3) samtools quickcheck input.bam (nothing is return here)
4) samtools view input.bam | head M00145:250:000000000-AV6RL:1:2109:19889:20270_AGCGATGGCCGG 129 chr1 21198 0 127M chr19 32891088 0 CGGTGCTCCCCACTCCACTGCCAGTCATCACTGGCTCTCCCTTCCCTTCATCCTCGTTCCCTATCTGTCACCATTTCCTGTCGTCGTTTCCTCTGAATGTCTCACCCTGCCCTCCCTGCTTACAAGT HGGGGGHHHGGHGHHHHHHHHHGHHHHHHHHHHHHHHHHHHHGHHHHHHHHHHHHGHGHHHHHHHHHHHHHHEHHFHGHHHHGFGGGGGHGHHGHGEHHHHHHHHHHHGHHGHHGGHGEHHHHHHHF NM:i:2 AS:i:117 XS:i:117 RG:Z:1 M00145:250:000000000-AV6RL:1:1105:24775:5278_TCTCACGGCTTC 129 chr1 24400 0 126M chr15 101966263 0 CTGCCTTGCGCACGAGCACTGCTGGGTAAATATTTGTTGGCTGCAGGAAAACGTGAAGGAATAGGCCCTCCAATGGGAGGAAAAGCATGAGTTGGGAGAGCAGAGCCACCACAGGAAACCAGGAGG HH44BFFGBE2EEGGGGGHHHGGEHGCGHGFHGHHHHHAGHHGGBG1?GHBG0FAHGHHHF3GHHHGFHHGFF4FB00?EEGF?FCEBEBFGBG3</?/BFF0CAG00GE/GEHAGHBG</F?/F? NM:i:2 AS:i:116 XS:i:116 RG:Z:1 M00145:250:000000000-AV6RL:1:2109:3969:19537_TCTCACGGCTTC 129 chr1 24400 0 126M chr15 101966263 0 CTGCCTTGCACACGAGCACTGCTGGGTAAATATTTGTTGGCTGCAGGAAAACGTGAAGGAATAGGCCCTCCAATGGGAGGAAAAGCATGAGTTGTGAGAGCAGAGCCACCACAGGAAACCAGGAGG HHHHHHHHFHHGHGGGGGHHHHHHHGGFGGGHHHHHFHHGFGGAGHHGHFHHGGHHGHGGHFHHHGHGHFGFHE@3FFFCFHGGHHHFHGHHHHHDFH?GHHHHHGHGHHHFHGHCGHHHGHGECF NM:i:0 AS:i:126 XS:i:126 RG:Z:1 M00145:250:000000000-AV6RL:1:1101:18438:10582_GGCAGCACAGTG 65 chr1 24606 0 73M chr15 101966283 0 CACACAGGGAAGCCAGATGGGTTCCCCAGGACCGGGATTCCCCAAGGGGGCTGCTCCCAGAGGGTGTGTTGCT GGGHHGHHGGHGHHHHHHHHHHGHHHHHGGHHHGGGGGHHHHHHGHGGGGGGGGHHHGHHHHGGGGFGHHHHH NM:i:0 AS:i:73 XS:i:73 RG:Z:1 M00145:250:000000000-AV6RL:1:2108:6457:14801_GGCAGCACAGTG 65 chr1 24606 0 73M chr15 101966283 0 CACACAGGGGAGCCAGATGGGTTCCCCAGGACCGGGATTCCCCAAGGGGGCTGCTCCCAGAGGGTGTGTTGCT GAEGHEHFGG?GEGHHHHHFHFGHHHHHGFEEHGGGCECFHEHHGHGGGGDGGGHGGEFHGEGG?FCEFGHFH NM:i:1 AS:i:68 XS:i:68 RG:Z:1 M00145:250:000000000-AV6RL:1:2109:11990:25142_GGCAGCACAGTG 65 chr1 24606 0 73M chr15 101966283 0 CACACAGGGAAGCCAGATGGGTTCCCCAGGACCGGGATTCCCCAAGGGGGCTGCTCCCAGAGGGTGTGTTGCT GGGHHGHHGGHGHHHHHHHHHHGHHHHHGGHHHGGGGGHHHHHHGHGGGGGGGGHHHGHHHHGGCFDDGHHGH NM:i:0 AS:i:73 XS:i:73 RG:Z:1 M00145:250:000000000-AV6RL:1:2106:25375:14190_TAGTAACACCGC 129 chr1 26335 0 128M chr15 101964499 0 GCAAGTTTGCTGGATGTCCTAACTTATTTCTGTGCCTCAGTTCTCCCATATGTAAGATCACAAAGGGGGTAAAGATGCAAGATATTTCCTGTGCACATCTTCAGATGAATTCCTTGTTAGTGTGTGTT HHHGHHHHGHHHHHHHHHHHHHHHHGHHHHHHHHIHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHFGGGGHHHHHHHHHHHHHHGHHHHHHHHHHHHHHHHHHHHHHGHGGHFHHHHHHHGHHHHHH NM:i:3 AS:i:117 XS:i:117 RG:Z:1 M00145:250:000000000-AV6RL:1:2107:14840:17062_TAGTAACACCGC 129 chr1 26335 0 128M chr15 101964499 0 GCAAGTTTGCTGGATGTCCTAACTTATTTCTGTGCCTCAGTTCTCCCATATGTAAGATCACAAAGGGGGTAAAGATGCAAGATATTTCCTGTGCACATCTTCAGATGAATTCCTTGTTAGTGTGTGTT HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHFHHHHHHHHGHGHHHHHHHHHHHHHHFHGGCEEHGHGHHHHHHHHHHHHHHHHGHHHHGHHHHHHHHHHHHHHHHHHHHGGHDGHFHHHH NM:i:3 AS:i:117 XS:i:117 RG:Z:1 M00145:250:000000000-AV6RL:1:2113:21381:12006_TAGTAACACCGC 129 chr1 26335 0 128M chr15 101964499 0 GCAAGTTTGCTGGATGTCCTAACTTATTTCTGTGCCTCAGTTCTCCCATATGTAAGATCACAAAGGGGGTAAAGATGCAAGATATTTCCTGTGCACATCTTCAGATGAATTCCTTGTTAGTGTGTGTT HHCFHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGHHHHHGHHHHHHFHHGHHHHHHFHHHHHFGHEGGGEGFHGHHFHHHHGHHHHHHGHHHHHGHHGHFHHHGHHHHHHHHHHHHHHHHHHHHHHHH NM:i:3 AS:i:117 XS:i:117 RG:Z:1 M00145:250:000000000-AV6RL:1:2114:9090:27249_TAGTAACACCGC 129 chr1 26335 0 128M chr15 101964499 0 GCAAGTTTGCTGGATGTCCTAACTTATTTCTGTGCCTCAGTTCTCCCATATGTAAGATCACAAAGGGGGTAAAGATGCAAGATATTTCCTGTGCACATCTTCAGATGAATTCCTTGTTAGTGTGTGTT HGCHHHGHFHGGGAGGFFDGGGFFHGEHHEFFGHGDFFGHHGFHH53FGFBEDEEGGBGFFDFFFHGGCEGHHHHHHGFHHGFFH4GFFHHFGBGHHF3FGHFGFGFGFHFDH4F3334B4??FGBBC NM:i:3 AS:i:117 XS:i:117 RG:Z:1
5) input.bam size=167M
Thanks
Mei
I'm assuming 1GB is a typo and you mean 1TB?
This is not something we have seen before, but usually a segmentation fault means either you are running out of memory (unlikely if you have 1TB, and it failed that quick) or a problem in C-code, which would probably suggest one of our dependencies!
For the dependencies, it could be a conflict or a bad version in your setup (sepcific to your setup) or a problem of the interaction between your input and the C-code (specific to your input).
Can you check whether you can run the test data in the QUICK_START guide? That should hopefully tell us whether its a general problem or specific to the input.
We should probably rule out memory by prefixing the command with time -v
and reporting the max memory usage.
sorry, it is a typo, the server has 1TB RAM.
Hi @mei2000 - The QUICK_START guide can be found here. For the testing purposes here, you can skip straight to step 5 and run the following commands
wget https://github.com/CGATOxford/UMI-tools/releases/download/v0.2.3/example.bam
umi_tools dedup -I example.bam --output-stats=deduplicated -S deduplicated.bam
The BAM will need indexing first. I.e.
wget https://github.com/CGATOxford/UMI-tools/releases/download/v0.2.3/example.bam
samtools index example.bam
umi_tools dedup -I example.bam --output-stats=deduplicated -S deduplicated.bam
Good point. Thanks!
Hi Ian,
I just run the downloaded test bam file from your github site, and I don’t see any error message from the command line. Your bam file size is about 20MB and my bam file is about 70MB.
Here is the command line: umi_tools group -I example.bam --paired --group-out=groups.tsv -L group_log.txt --output-bam -S mapped_grouped.bam
Thanks
Robin
From: Ian Sudbery notifications@github.com Reply-To: CGATOxford/UMI-tools reply@reply.github.com Date: Friday, October 13, 2017 at 5:37 AM To: CGATOxford/UMI-tools UMI-tools@noreply.github.com Cc: "Ge, Robin" robin.ge@novartis.com, Mention mention@noreply.github.com Subject: Re: [CGATOxford/UMI-tools] get the Segmentation fault from umi_tools group command line (#193)
The BAM will need indexing first. I.e.
wget https://github.com/CGATOxford/UMI-tools/releases/download/v0.2.3/example.bam
samtools index example.bam
umi_tools dedup -I example.bam --output-stats=deduplicated -S deduplicated.bam
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_CGATOxford_UMI-2Dtools_issues_193-23issuecomment-2D336402915&d=DwMFaQ&c=ZbgFmJjg4pdtrnL2HUJUDw&r=gJyRJMT77Ocx04AusJOp_Y12i6_6vDEfcqJOkvhC_14&m=9-ph9qucCgN1Tn2X_OSiKG1EpKLcUa3uqLVg703n_nM&s=jwzO_lAfRMYwen_kGb9O1l4ROpBYTohlW8qWOKgMQCE&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AfOVaOz1FazgJAauLDqDBOFNyLrRRia9ks5sry9agaJpZM4P3FSg&d=DwMFaQ&c=ZbgFmJjg4pdtrnL2HUJUDw&r=gJyRJMT77Ocx04AusJOp_Y12i6_6vDEfcqJOkvhC_14&m=9-ph9qucCgN1Tn2X_OSiKG1EpKLcUa3uqLVg703n_nM&s=YwSHVioVIoOyRZAw8FwCwBCrNRioDjF17XeC6cOFmcU&e=.
Hi @mei2000 - The good news is the installation appears to have worked OK if you can run the test bam file. The bad news is this means there's something unexpected in your BAM.
In order to work out exactly what the problem is, the best approach is to reduce the BAM down to a more manageable size which still reproduces the error. From the final line in the logfile, it looks like the issue occurs between read 2970000-2980000. From your failed command on your BAM, you should have a partial output BAM called output.bam
. You can create a new minimal BAM which starts from the last read in output.bam
and contains the next 10000 reads - samtools view [region] output.bam > minimal.bam
, where region takes the form "contig:start-end", e.g "chr1:1-10000". Hopefully, you should get the same error with this minimal BAM. By repeating this process, you may even be able to narrow the error down to a single read/read pair. If you're OK to share you data, you can also email me this minimal BAM and I can try and hunt down the issue (tss38@cam.ac.uk)
Hi @mei2000 : did you say that your BAM was 167M reads, but only 70MB on disk? I feel like 167M reads should take more than 70MB of disk space.
I have trouble to run the umi_tools group command line, and it always run into the segmentation fault. I just updated the umi_tools from 0.4.4 version to 0.5.0 version with the pip command. Here is my pip command line: pip install --user --upgrade mui_tools Collecting umi_tools Using cached umi_tools-0.5.0.tar.gz Requirement already up-to-date: setuptools>=1.1 in ./.local/lib/python3.5/site-packages (from umi_tools) Requirement already up-to-date: numpy>=1.7 in ./.local/lib/python3.5/site-packages (from umi_tools) Requirement already up-to-date: pandas>=0.12.0 in ./.local/lib/python3.5/site-packages (from umi_tools) Requirement already up-to-date: future in ./.local/lib/python3.5/site-packages (from umi_tools) Requirement already up-to-date: regex in ./.local/lib/python3.5/site-packages (from umi_tools) Requirement already up-to-date: scipy in ./.local/lib/python3.5/site-packages (from umi_tools) Requirement already up-to-date: matplotlib in ./.local/lib/python3.5/site-packages (from umi_tools) Requirement already up-to-date: python-dateutil>=2 in ./.local/lib/python3.5/site-packages (from pandas>=0.12.0->umi_tools) Requirement already up-to-date: pytz>=2011k in ./.local/lib/python3.5/site-packages (from pandas>=0.12.0->umi_tools) Requirement already up-to-date: cycler>=0.10 in /usr/prog/python/3.5.1-goolf-1.5.14-NX/lib/python3.5/site-packages/cycler-0.10.0-py3.5.egg (from matplotlib->umi_tools) Requirement already up-to-date: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in ./.local/lib/python3.5/site-packages (from matplotlib->umi_tools) Requirement already up-to-date: six>=1.10 in ./.local/lib/python3.5/site-packages (from matplotlib->umi_tools) Installing collected packages: umi-tools Running setup.py install for umi-tools ... done Successfully installed umi-tools-0.5.0
here is my umi_tools group command line: [1]+ Segmentation fault ~geru1/.local/bin/umi_tools group -I input.bam --paired --group-out=input.tsv -L logfile.txt --output-bam -S output.bam
Please help me to resolve these issues. Thank you,
Mei