c-zhou / yahs

Yet another Hi-C scaffolding tool
MIT License
131 stars 19 forks source link

Combining PoreC data #59

Open fZiegle opened 1 year ago

fZiegle commented 1 year ago

Hi Zhou,

I'm currently working with Yahs and ONT PoreC data to scaffold my plant genome (~800 Mb). I've tried two different protocols (3 libraries each with concatenated .bed files) for PoreC analysis resulting in:

protocol A) 19.2 M paired reads, 2.5 M intra, 1.67 M inter (Read N50 ~1 kb)

short_scaffolds

protocol B) 16.8 M paired reads, 5.9 M intra, 10.8 M inter (Read N50 ~ 3.3 kb)

Long_scaffolds

Then I've concatenate all .bed files and running yahs resulting in

All_Scaffolds

According to Juicebox results and assembly statistics, combining the different runs gives more noisy and missassembled results than using only one data set. I wonder if there could be a problem with long and short distance contacts or with the ratios between inter/intra/read pairs between the datasets? Or should the two protocols be run consecutively? Probably the first dataset (protocol A) has a negative impact?

Best Freya

gotouerina commented 8 months ago

Hi I am also woking on PORE-C data, could you please tell me what methods now you use to analyze the pore-c data? Is yahs better than any other methods?