concatenating bams hangs at 66%

nbedelman commented 3 years ago

Hello, I am assembling clusters by aligning to a reference genome. I've done this before without issue, but this time I'm having some trouble.

  Step 6: Clustering/Mapping across samples 
  [#############       ]  66% 1:20:04 | concatenating bams

This run did have an earlier problem, which is that the reference scaffolds were too long for a .bai index. That issue threw this error:

Message: IPyradError: b'[E::hts_idx_check_range] Region 536900972..536901040 cannot be stored in a bai index. Try using a csi index with min_shift = 14, n_lvls >= 6\n[E::sam_index] Read \'22d07ca0bea27a5ba77e960ef530e853;size=4\' with ref_name=\'chr1\', ref_length=690654357, flags=0, pos=536900973 cannot be indexed\nsamtools index: failed to create index for "/gpfs/loomis/scratch60/skelly/nbe4/0.assembleClusters/ranaTemp/ranaTemporaria_refmapping/YMF2018_096_S6001.trim-mapped-sorted.bam": Numerical result out of range\n'
  Parallel connection closed.
---------------------------------------------------------------------------IPyradError                               Traceback (most recent call last)<string> in <module>
/gpfs/loomis/project/skelly/nbe4/conda_envs/ipyrad/lib/python3.8/site-packages/ipyrad/assemble/clustmap.py in mapping_reads(data, sample, nthreads, altref)
   1977     error4 = proc4.communicate()[0]
   1978     if proc4.returncode:
-> 1979         raise IPyradError(error4)
   1980 
   1981     # Running cmd5 writes to either edits/sname-refmap_derep.fa for SE
IPyradError: b'[E::hts_idx_check_range] Region 536900972..536901040 cannot be stored in a bai index. Try using a csi index with min_shift = 14, n_lvls >= 6\n[E::sam_index] Read \'22d07ca0bea27a5ba77e960ef530e853;size=4\' with ref_name=\'chr1\', ref_length=690654357, flags=0, pos=536900973 cannot be indexed\nsamtools index: failed to create index for "/gpfs/loomis/scratch60/skelly/nbe4/0.assembleClusters/ranaTemp/ranaTemporaria_refmapping/YMF2018_096_S6001.trim-mapped-sorted.bam": Numerical result out of range\n'

I was able to get around this by making a csi index instead of bai - I added the "-c" option to line 1935 of "mapping_reads" in clustmap.py

cmd4 = [ip.bins.samtools, "index", "-c", bamout] #added -c for larger chroms NBE 2/1/2021

But maybe that somehow led to an inability to concatenate the bams? Can I just manually concatenate bams and move on to step 7?

Thanks, happy to supply more info!

Nate

isaacovercast commented 3 years ago

Are you sure it's stuck and not still working? Can you look at top and free -g? The 66% progress indicates that the bam files have been successfully concatenated and sorted, and now ipyrad is running a samtools index on the concatenated bam. It's possible it's still working and just taking a while. It's also possible if the concatenated bam file is really huge that you are running out of RAM and it's bogging down swapping to disk. How much ram do you have on the box you're running on?

nbedelman commented 3 years ago

Thanks for the fast reply! It's on a cluster so I'm not sure how to run top, but I don't think space was the issue. I found the indexing line in clustmap_across.py and also added the -c flag there

        cmd3 = [
            ipyrad.bins.samtools,
            "index", "-c",  #edited to add -c NBE 2/1/2021
            os.path.join(
                self.data.dirs.across,
                "{}.cat.sorted.bam".format(self.data.name)
            ),
        ]

which worked in getting it to finish the step! It's now on to step 7, currently writing conversions so hopefully I"m out of the woods. Do you think it would be worth it to make csi the default for indexing? According to googling, bai can only handle chromosomes up to 512Mb, but csi can deal with bigger ones. Or is there a better way to deal with this situation?

nbedelman commented 3 years ago

with the csi index, the step took 45 seconds

isaacovercast commented 3 years ago

What version of ipyrad are you working with?

nbedelman commented 3 years ago

I have 0.9.58

isaacovercast commented 3 years ago

I added some code to protect against this. It will try the .bai indexing and if it fails it'll fall back and try the .csi version. It's checked into the repo. If you feel like checking it out and testing it that would be cool.

nbedelman commented 3 years ago

awesome! I'll give it a try

nbedelman commented 3 years ago

Sorry for the slow reply, but it seems to have worked! I am running ipyrad with a new reference, which seems to have an error on step 5, but I'll open a new issue

isaacovercast commented 3 years ago

+1

dereneaton / ipyrad

concatenating bams hangs at 66% #435