ZarnackGroup / racoon_clip

racoon_clip processes your iCLIP and eCLIP data from raw files to single-nucleotide crosslinks in a single step.
https://racoon-clip.readthedocs.io/en/latest/index.html
1 stars 1 forks source link

fail to index the output bam file from umi-tools #5

Closed fulaibaowang closed 7 months ago

fulaibaowang commented 8 months ago

Hi,

it failed when i tried to index the sample.Aligned.sortedByCoord.out.duprm.sort.bam file using samtools index.

[E::hts_idx_push] Unsorted positions on sequence #5: 176337393 followed by 154875602                                                                                                                                
[E::sam_index] Read 'K00180:234:HCTHMBBXX:1:1101:1012:11759_NAATGGCCAA' with ref_name='chr5', ref_length=181538259, flags=0, pos=154875602 cannot be indexed                                                        
samtools index: failed to create index for "ENCFF149LRA.Aligned.sortedByCoord.out.duprm.sort.bam" 

I understood this does not affect racoon pipeline at all but do you have idea why do I get this error? Thanks!

MelinaKlostermann commented 7 months ago

Hi, so just for clarification: it's a file that is produced in the folder aligned, right? And there should be in the same folder also a file sample.Aligned.sortedByCoord.out.duprm.bam and sample.Aligned.sortedByCoord.out.duprm.bam.bai right?

So I think you could use the sample.Aligned.sortedByCoord.out.duprm.bam.bai instead of indexing yourself. (".duprm" means duplicate removal so the files are also from after umi-tools).

With the sample.Aligned.sortedByCoord.out.duprm.sort.bam it seems that the file is not sorted although it is called ".sort".

Could you try to do samtools sort sample.Aligned.sortedByCoord.out.duprm.sort.bam samtools index sample.Aligned.sortedByCoord.out.duprm.sort.bam?

fulaibaowang commented 7 months ago

Hi, so just for clarification: it's a file that is produced in the folder aligned, right? And there should be in the same folder also a file sample.Aligned.sortedByCoord.out.duprm.bam and sample.Aligned.sortedByCoord.out.duprm.bam.bai right?

Yes, aligned folder; no, there is no .bai after umi-toos

So I think you could use the sample.Aligned.sortedByCoord.out.duprm.bam.bai instead of indexing yourself. (".duprm" means duplicate removal so the files are also from after umi-tools).

With the sample.Aligned.sortedByCoord.out.duprm.sort.bam it seems that the file is not sorted although it is called ".sort".

Could you try to do samtools sort sample.Aligned.sortedByCoord.out.duprm.sort.bam samtools index sample.Aligned.sortedByCoord.out.duprm.sort.bam?

that is very helpful, thanks. I understand sample.Aligned.sortedByCoord.out.duprm.bam file is from umi-tools. But what is sample.Aligned.sortedByCoord.out.duprm.sort.bam if it is not sorted?

fulaibaowang commented 7 months ago

i see now it is sorted by name https://github.com/ZarnackGroup/racoon_clip/blob/d7423fe1905c0f5310b364f87e3d2d6a67aafa0c/racoon_clip/workflow/Snakefile#L805. probably that is why. Any way I can also sort and index by myself. Thanks a lot! The tool is very good.