c-zhou / yahs

Yet another Hi-C scaffolding tool
MIT License
122 stars 18 forks source link

0 read pairs #47

Open ajkshdkjahdka opened 1 year ago

ajkshdkjahdka commented 1 year ago

[I::main] dump hic links (BAM) to binary file yahs.out.bin [I::dump_links_from_bam_file] 1 million records processed, 0 read pairs [I::dump_links_from_bam_file] 2 million records processed, 0 read pairs [I::dump_links_from_bam_file] 3 million records processed, 0 read pairs [I::dump_links_from_bam_file] 4 million records processed, 0 read pairs [I::dump_links_from_bam_file] 5 million records processed, 0 read pairs

what happened ? could you please tell why?

c-zhou commented 1 year ago

Hello @ajkshdkjahdka,

For a BAM input, you need to make sure the reads in your BAM file are with proper SAM flags (which are used to pair reads). This should not be a problem if your BAM file was generated in a standard way. Can you please provide more information about your alignment pipeline? Show here a few lines of your BAM file would help too.

Chenxi

c-zhou commented 1 year ago

Also, if your BAM file was sorted by read names, the read names of a read pair need to be identical. Another parameter that would affect the filtering for a name-sorted BAM is the mapping quality threshold (-q), the default value is 10.

ajkshdkjahdka commented 1 year ago

acturally, i used the pipline you described herehttps://www.jianshu.com/p/620ddc8764ee, the chromap generated the bam file, but it did not work out

Jrbfo commented 1 year ago

Hi, I have the same problem, did you solve it ?

Thousandl commented 1 year ago

yeah,I also have the same problem。But my BAM file was converted by SAM file via samtools . More , the yahs.out .bin I got is also empty.

c-zhou commented 1 year ago

Hello @ajkshdkjahdka, @Jrbfo, @Thousandl,

I am not familiar with chromap and am not the author of that blog. I can have a look if you can show me here a few lines of your BAM file, for example with samtools view -F0xD00 -q10 ${your_bam_file} | cut -f1-9 | head.

Chenxi

Jrbfo commented 1 year ago

图片

Thousandl commented 1 year ago

微信图片_20230214193733 @c-zhou

c-zhou commented 1 year ago

Thanks @Jrbfo and @Thousandl,

Your files have the same problem. They are essentially not standard BAM/SAM files. In a standard BAM file, we are expected to see identical read names (the first column) for a read pair, i.e., no '/1' and '/2' appended to the read names. If you also used the pipeline described here https://www.jianshu.com/p/620ddc8764ee like @ajkshdkjahdka, this was likely introduced by chromap. It is probably worth asking the chromap group to fix it.

For a quick fix, you can convert your BAM file to a BED file with bedtools and then use the BED file as input to YaHS. You can do something like this samtools view -bh -u -F0xF0C -q10 ${your_bam_file} | bedtools bamtobed | awk -v OFS='\t' '{$4=substr($4,1,length($4)-2); print}' >${your_out_prefix}.bed.

Best, Chenxi

c-zhou commented 1 year ago

Chromap has the option to output BED format files. That might be another option if you do not mind redoing read mapping. Chenxi

ajkshdkjahdka commented 1 year ago

OK THANK YOU VERY MUCH! I WILL TRY IT NOW

Thousandl commented 1 year ago

Thanks @c-zhou Ok, thank you very much for the suggestion, I'll this quick fix before going and asking chromap group to fix. Your answer is of great help to me. Thank you again for your sincere reply and wish you success in your work. best wishes Liqian

c-zhou commented 1 year ago

Thanks Liqian. The same to you! Chenxi

Jrbfo commented 1 year ago

@c-zhou Thank You So Much!

anxuan-web commented 1 year ago

Thanks @c-zhou Ok, thank you very much for the suggestion, I'll this quick fix before going and asking chromap group to fix. Your answer is of great help to me. Thank you again for your sincere reply and wish you success in your work. best wishes Liqian

Hi! Did you solve this problem after converting bam file to bed file?

Thousandl commented 1 year ago

Yes, I have solved this problem and it is already in use.

---Original--- From: @.> Date: Wed, Jun 14, 2023 11:34 AM To: @.>; Cc: @.**@.>; Subject: Re: [c-zhou/yahs] 0 read pairs (Issue #47)

Thanks @c-zhou Ok, thank you very much for the suggestion, I'll this quick fix before going and asking chromap group to fix. Your answer is of great help to me. Thank you again for your sincere reply and wish you success in your work. best wishes Liqian

Hi! Did you solve this problem after converting bam file to bed file?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

ColinR01 commented 7 months ago

@c-zhou I have the same problem, here a few lines of BAM file,Thanks. image image

felixlee0608 commented 2 months ago

length($4)-2

length($4)-4 ? bed read-pairs need same name ,also?

c-zhou commented 2 months ago

Hi @felixlee0608,

Thanks for your reply. It should be length($4)-2. This is to removed '/1' and '/2' suffixes.

I realised it is not really necessary to remove suffixes for BED files. The program has been written to deal with them. See https://github.com/c-zhou/yahs/blob/2630cff2d247d794e8e776ee42f8d45ee1e9d3cb/asset.c#L167-L180.

This should work too, samtools view -bh -u -F0xF0C -q10 ${your_bam_file} | bedtools bamtobed | >${your_out_prefix}.bed. The BAM file should be sorted by read names.

For BAM files, the read names for a read pair need to be identical. No '/1' or '/2' suffixes are allowed.

Best, Chenxi

afiyachida commented 1 month ago

@c-zhou Hello, I am facing the same issue with '0' read pairs and my bam file seems to have '/1' and '/2' appended to the read names. However, this happens only in certain samples that I am working with. I am running the same pipeline for all samples. I am unable to understand why the BAM files happen to have /1-/2 tags on it in certain cases but not in others? Also, is there an option to modify the BAM file instead of converting it to BED?
Thank you!

afiyachida commented 1 month ago

@felixlee0608 Were you able to remove the '/1' /2' tags from the BAM file directly ?